useKit Search with Elasticsearch and Elastica

Search in web applications is not a feature anymore, it is a basic navigation system. As Google shows us, to navigate to specific sites, users do not try to figure out which domain ending a company has, but just type for example “useKit” into the Google search and click on the first link to navigate to useKit.com. When we started to build useKit it was clear that we need a good search engine to allow our users to retrieve the content in an efficient way, but also for navigation. A good example for this is our auto-complete context navigation in useKit knowledge.

Solr vs. Elasticsearch

We evaluated different systems but it was soon pretty clear that we will choose a solution based on Lucene, as Lucene offered all the features we needed and was under active development. Since most of our application is written in PHP, we needed a solution based on Lucene which allows access with PHP. The final decision was made for solr, which is scalable and offers a REST interface with a JSON API. For more then a year our search was powered by solr.

After a year we added the feature to upload files to the useKit system, and our users started to heavily use it requesting soon that these files are full text searchable too. We started to look around for solutions to do this in solr. It looked like the Apache Tika project offered the features we needed and we started to integrate Tika into our Solr setup but we were facing several problems. These problems caused us to reinspect our search solution and we stumbled over elasticsearch. At this time elasticsearch, developed by Shay Banon was still a young project, but looked promising. After some testing of the general search and especially the mapper-attachement plugin, we decided to move our search to elasticsearch. The decision was less based on the speed, scalability or features of elasticsearch, but more on its simplicity to setup, deploy and its access of the REST interface.

Elastica

As the existing elasticsearch PHP clients did not quite meet our requirements and were not expandable, we started to develop our own elasticsearch PHP client, with the support of Cargo Media, which was open sourced in october 2010 in github under the name Elastica. Since then both, elasticsearch and Elastica heavily evolved. The speed of elasticsearch improved and its automatic sharding and replication allows us to simply run elasticsearch on multiple machines to index all the files uploaded by our users. Also the development of Elastica moved forward with the support of more then 30 external contributors, making Elastica now one of the standard clients for elasticsearch in combination with PHP.

Elastica leverages the REST interface of elasticsearch and data is transfered in the JSON format. We were already familar with REST and JSON as our own service API is also implemented based on REST and JSON. As we implemented Elastica in combination with the Zend Framework, Elastica has a similar structure. Elastica was built with the idea in mind to make it extensible. Elasticsearch evolves fast and with every release new features like query, filter or facet types are added. To extend Elastica with a new query type, only an object has to be added which extends Elastica_Query_Abstract.

To have an elasticsearch instance with file support up and running, only 3 simple steps are needed:

  • Download elasticsearch
  • Install the attachment plugin. Uncompress the elasticsearch package, go into the elasticsearch directory and type: ./bin/plugin install mapper-attachments
  • Start elasticsearch: ./bin/elasticsearch -f

With Elastica it is simple to add a file to your search index and search trough it.
Replace in the code below the file path to a pdf through your own pdf and you’re ready to search.

	$client = new Elastica_Client();
	$index = $client->getIndex('useKit');

	//Creates new index
	$index->create(array(), true);
	$type = $index->getType('file');

	// Creates a new document
	$doc = new Elastica_Document();
	$doc->addFile('content', 'path/to/your.pdf');

	// Adds the document to the index
	$type->addDocument($doc);

	// Index needs a moment to be updated
	$index->refresh();

	// Searchs in the index and returns the results
	$resultSet = $type->search('useKit');

The development of Elastica will go on as we want to deliver the best search solution to our users, so the user has always the results at his finger tips. If you have questions using Elastica, post your questions directly in the Elastica Google Group.

Is Google+ worth 20bn?

Within the first week after the release of Google+, Googles next attempt to enter the social network field, Googles market capitalization went up by roughly 20bn dollar. This is more worth mentioning  then the 100bn valuation currently seen for Facebook on secondary markets. Not only this caught our attention and we had a closer look at Google+.
When we first saw Google+ we were impressed by the technical “touchiness“ of the user interface. We realized this is something that could be big in the future – topic focus and noise reduction. While Facebook, MySpace and other Social Networks are centered around friends and connections between people, Google+ is build to differentiate between groups of interests or topics. This reduces unwanted, disrupting, and unimportant information while also allowing to better control who accesses what.
Here at useKit we have built useKit K! around the combination of content+users, what we call a “context”. We think Google+ circles represent this combination of content and users. The difference between Google+ and useKit is Google+ was built to create an open space for discussions and flow of information, while useKit K! is built as a closed system that supports discussions and collaboration within closed user groups or companies.

Google+ invite

Google+ has some very interesting concepts that we will keep watching and the most important in our opinion is the change in the way informations are distributed within social networks. While Facebook and others rely on symmetrical connections (established friendships) Google+ is more asymmetric like the Followers concept in Twitter. Users can be added to circles without their permission, which resembles following in Twitter, and interestingly and important for privacy one is not able to see in which circles one is being placed by someone else. On the other hand Google+ also shows how an established ecosystem of applications and users can boost the popularity of a new application with Google+ counting 20+ million users after a few weeks. And further how distribution can be build into a product, for example the deep integration with gmail and how not yet signed up people are commonly included in Google+ (see screenshot).