Evaluating the Search Engine in Email Archiving

All email archive solutions have an embedded index engine, and these have an important effect on the functionality of an archiving product.

Some of the most popular indexing engines are Alta Vista, dtSearch, FAST, and IDOL. They index all the archive files (message body, subject line, attachments) and provide for fast archive search and discovery. Search functions like wild cards, proximity, and Boolean operators are a function of the search index.

Not all indexing solutions are the same. They vary by performance, amount of index capacity, and the search functions they support. When evaluating email archiving, it's appropriate to give thought to the underlying search technology. So ask the vendor to demonstrate the power of the index used in the archive solution. Perform sample searches to gauge its performance. And test the search functions, such as Boolean search (And, Or, Not) and proximity search, if these features are available.

Beyond basic search and discovery, email archive solutions can also use the index engine to help in retention management. This is done by applying labels. For example, a search engine can let you assign tags based on search criteria (e.g., anything with "tax return" in the subject line gets the tag "7 Year Retention"). Another example is the ability to search previously tagged content to cull out information for legal discovery.

Thus the choice of underlying index engine has a significant effect on the capabilities of an archiving product.

... Bob Spurzem

One Comment

  1. Posted February 9, 2009 at 5:48 PM | Permalink

    Hi Bob,

    I’d be remiss in my duties if I didn’t comment on our own capabilities in this area. EMC uses us as the indexing engine for its EmailXtender and DiskXtender product lines (for many years now), as does HP/TOWER (TRIMContext), among others.

    I would argue that customers still aren’t fully leveraging search’s analytics/mining/tagging capabilities in standard deployments, so it’s quite apparent that we’ve only just begun to scratch the surface when it comes to the functions of embedded indexing and retrieval.

    Part of that has to do with how inundated customers are by the multitude of “e-discovery” solutions. But the good news is law firms, lit support teams, service bureaus, etc., seem to be increasingly tackling this topic. Based on the conversations we had at LegalTech last week, it’s clear that the search piece is playing a larger role … if nothing else, it’s getting more mindshare with these groups.

  2. Posted February 10, 2009 at 1:35 PM | Permalink

    Good post, Bob. Search speed and flexibility is probably the #1 long-term success factor for an archiving project. By the way, for our service, we use the lucene framework which is widely-used in very large-scale search deployments.

Post a comment

You must be logged in to post a comment. To comment, first join our community.