Friday, 7 August 2015

What is Lucene ?

  • Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.
  • It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
  • Lucene can index plain text, integers, index PDF, Office Documents. etc.

Indexing concept to enable faster search by lucene or Inverted Index concepts ?

Lucene creates something called Inverted Index. Normally we map   
document -> terms in the document. (here document is collection of information or searchable items)
But, Lucene does the reverse.

Creates a index term -> list of documents containing the term, which makes it faster to search.


How to user Apache lucene library into your application though maven ?

                            Maven Dependency
                        
 <dependency>
      <groupid>org.apache.lucene</groupid>
     <artifactid>lucene-core</artifactid>
    <version>${Version}</version>
    <type>jar</type>
   <scope>compile</scope>
 </dependency>

Download Dependency
Download Lucene from http://lucene.apache.org/ and

add the lucene-core.jar in the classpath

Note: The current Apache lucene version is 5.2.X (as of 7th Aug 2015). ${version} should be replaced with proper version what you may want to use.

Lucene Indexing flow to enable faster search

Let's understand the picture first from Bottom to Center.
    The Raw Text is used to create a Lucene "Document" which is analyzed using the   provided lucene Analyzer and Document is added to the index based on the Store,TermVector and Analyzed property of the Fields. Next, the search from top to center.The users specify the query in a text format.The query Object is build based on the query text and the result of  the executed query is returned as TopDocs

* *   Document is a class provided by lucene core library.