Pages

Sunday, April 14, 2013

Apache Solr - Open Source Search Engine


Apache Solr



Solr  is an open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting,faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable.Solr is the most popular enterprise search engine. Solr 4 adds NoSQL features. 
Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat, Jboss or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has an plugin architecture to support more advanced customization.
Apache Lucene and Apache Solr are both produced by the same Apache Software Foundation development team since the two projects were merged in 2010. It is common to refer to the technology or products as Lucene/Solr or Solr/Lucene.
One advantage of Solr in enterprise projects is that you don't need any Java code, although Java itself has to be installed. If you are unsure when to use Solr and when Lucene, these answers could help. If you need to build your Solr index from websites, you should take a look into the open source crawler called Apache Nutch before creating your own solution.

To be convinced that Solr is actually used in a lot of enterprise projects, take a look at this amazing list of public projects powered by Solr. If you encounter problems then the mailing list or stackoverflow will help you. 

Features


  • Uses the Lucene library for full-text search
  • Faceted navigation
  • Hit highlighting
  • Query language supports structured as well as textual search
  • JSON, XML, PHP, Ruby, Python, XSLT, Velocity and custom Java binary output formats over HTTP
  • HTML administration interface
  • Replication to other Solr servers - enables scaling QPS
  • Distributed Search through Sharding - enables scaling content volume
  • Search results clustering based on Carrot2
  • Extensible through plugins
  • Flexible relevance - boost through function queries
  • Caching - queries, filters, and documents
  • Embeddable in a Java Application
  • Geo-spatial search
  • Automated management of large clusters through ZooKeeper
  • More function queries
  • Field Collapsing 
  • A new auto-suggest component
Stay Tuned for Installation and configuration Part !!

Thanks!!
Kuldeep