Reds - light-weight full text search for nodejs backed by redis

Reds (red-s) is a very small (~300LOC) light-weight full text search library for node.js.

I wrote reds with Kue in mind, a priority job queue for node. I wanted to add search capabilities so you can easily find jobs by any of the arbitrary data provided, names, emails, anything.

API

You can use reds for multiple isolated search indexes, which is why you must pass a key to reds.createSearch(), as it’s used for namespacing.

 var search = reds.createSearch('pets');

As mentioned this library could be used with anything, to illustrate this we can even use it with a regular javascript array by indexing the value indices as shown below, where the first value passed to search.index() is the text, and the second is the id.

 var strs = [];
 strs.push('Tobi wants four dollars');
 strs.push('Tobi only wants $4');
 strs.push('Loki is really fat');
 strs.push('Loki, Jane, and Tobi are ferrets');
 strs.push('Manny is a cat');
 strs.push('Luna is a cat');
 strs.push('Mustachio is a cat');

 strs.forEach(function(str, i){ search.index(str, i); });

Within the another process, or the same one, we can then invoke search.query() with a string and callback invoked with possible error and array of ids. With these ids we can then determine their original values, be it an array, object in another data store etc.

 search.query('luna cat', function(err, ids){
   if (err) throw err;
   console.log('Search results:');
   ids.forEach(function(id){
     console.log('  - %s', strs[id]);
   });
 });

Producing:

  Search results:
     - Luna is a cat

You may also remove an id from the index:

  search.remove(id[, callback]);

Implementation

While reds is backed by Redis you can easily use reds with any other data store, as it simply indexes by arbitrary numeric or string ids. This means you could create an index of files on disk, mongodb documents, urls, anything.

The indexing process works like this:

- tokenize words
- strip stop words ("about", "after", "am", "an", ...)
- stem words
- apply metaphone
- add id to metaphone constant set 

The process of stemming the words and applying the metaphone algorithm provide leeway so the user does not need exact matches. For example thanks to metaphone the names “steven” and “stephen” both resolve to the constant STFN. The process of stemming reduces variants to it’s stem, for example “counting” becomes “count”, and “waits” becomes “wait”.

When a query is performed the same sequence is applied, resulting in an array of metaphone constants, providing us with the keys necessary to perform the Redis union or intersection to fetch our ids.

Performance

Preliminary benchmarks show that a small 1.6kb body of text is currently indexed in ~6ms, or 163 ops/s. Medium bodies such as 40kb operate around 6 ops/s, or 166ms.

Querying with a multi-word phrase, and an index containing ~3500 words operates around 5300 ops/s. Not too bad.

Indexing performance was decreased by nearly 100% by applying the porter stemmer algorithm in Chris Umbel’s natural library, however reds is still reasonably quick at indexing text. If you have huge documents, you may want to consider allowing users to specify a description instead.

Future

  • use sorted sets for ordering and priority
  • ranges
  • sorting
  • perf optimization if necessary

Notes

  1. hileon reblogged this from tjholowaychuk
  2. winladen reblogged this from tjholowaychuk
  3. nulltask reblogged this from tjholowaychuk
  4. tjholowaychuk posted this