Search in web applications is not a feature anymore, it is a basic navigation system. As Google shows us, to navigate to specific sites, users do not try to figure out which domain ending a company has, but just type for example “useKit” into the Google search and click on the first link to navigate to useKit.com. When we started to build useKit it was clear that we need a good search engine to allow our users to retrieve the content in an efficient way, but also for navigation. A good example for this is our auto-complete context navigation in useKit knowledge.
Solr vs. Elasticsearch
We evaluated different systems but it was soon pretty clear that we will choose a solution based on Lucene, as Lucene offered all the features we needed and was under active development. Since most of our application is written in PHP, we needed a solution based on Lucene which allows access with PHP. The final decision was made for solr, which is scalable and offers a REST interface with a JSON API. For more then a year our search was powered by solr.
After a year we added the feature to upload files to the useKit system, and our users started to heavily use it requesting soon that these files are full text searchable too. We started to look around for solutions to do this in solr. It looked like the Apache Tika project offered the features we needed and we started to integrate Tika into our Solr setup but we were facing several problems. These problems caused us to reinspect our search solution and we stumbled over elasticsearch. At this time elasticsearch, developed by Shay Banon was still a young project, but looked promising. After some testing of the general search and especially the mapper-attachement plugin, we decided to move our search to elasticsearch. The decision was less based on the speed, scalability or features of elasticsearch, but more on its simplicity to setup, deploy and its access of the REST interface.
As the existing elasticsearch PHP clients did not quite meet our requirements and were not expandable, we started to develop our own elasticsearch PHP client, with the support of Cargo Media, which was open sourced in october 2010 in github under the name Elastica. Since then both, elasticsearch and Elastica heavily evolved. The speed of elasticsearch improved and its automatic sharding and replication allows us to simply run elasticsearch on multiple machines to index all the files uploaded by our users. Also the development of Elastica moved forward with the support of more then 30 external contributors, making Elastica now one of the standard clients for elasticsearch in combination with PHP.
Elastica leverages the REST interface of elasticsearch and data is transfered in the JSON format. We were already familar with REST and JSON as our own service API is also implemented based on REST and JSON. As we implemented Elastica in combination with the Zend Framework, Elastica has a similar structure. Elastica was built with the idea in mind to make it extensible. Elasticsearch evolves fast and with every release new features like query, filter or facet types are added. To extend Elastica with a new query type, only an object has to be added which extends Elastica_Query_Abstract.
To have an elasticsearch instance with file support up and running, only 3 simple steps are needed:
- Download elasticsearch
- Install the attachment plugin. Uncompress the elasticsearch package, go into the elasticsearch directory and type:
./bin/plugin install mapper-attachments
- Start elasticsearch:
With Elastica it is simple to add a file to your search index and search trough it.
Replace in the code below the file path to a pdf through your own pdf and you’re ready to search.
$client = new Elastica_Client(); $index = $client->getIndex('useKit'); //Creates new index $index->create(array(), true); $type = $index->getType('file'); // Creates a new document $doc = new Elastica_Document(); $doc->addFile('content', 'path/to/your.pdf'); // Adds the document to the index $type->addDocument($doc); // Index needs a moment to be updated $index->refresh(); // Searchs in the index and returns the results $resultSet = $type->search('useKit');
The development of Elastica will go on as we want to deliver the best search solution to our users, so the user has always the results at his finger tips. If you have questions using Elastica, post your questions directly in the Elastica Google Group.