Exchange 2010 Indexing Good Enough for E-Discovery?

Exchange 2010 includes archiving and e-discovery; these need powerful indexing capabilities.

Third-party on-premise archiving vendors struggle constantly with indexing technologies. Indices get corrupt and take days or weeks to regenerate. Searches return results that aren't as expected, or aren't understood. Indexing technologies age, and when a vendor replaces them, your corporate memory looks very different.

To understand the challenges, think of Outlook:

  • It's often hard to find email in PSTs.
  • You frequently don't get what you're looking for.
  • You get "indexing is not complete" messages.

Now consider e-discovery on a corporate scale. Searches become critically important. For example, you may need to defend your CEO against accusations that might land him in jail; your CEO is certain that an email is there, but the search tool can't locate it. In the meantime, you have five days to find the email, and the clock is ticking.

Exchange 2010 includes a new discovery module that searches primary and archive mailboxes, and works across multiple mailboxes. It is built by one of the smartest teams in the Exchange product group. However, it's unclear whether or not the search will be good enough. If it's like our experiences with Outlook search, the answer is no.

We think Exchange search should be a lot better than that of Outlook. Nevertheless, the challenges are substantial, and there is a good possibility that it won't be up to the job. For example:

  • Important file types may not be supported.
  • Documentation may be unclear on how to adjust the index, and when adjustments need to take place.
  • Users may not understand the results of a search.
  • There may be problems with non-English searches.
  • Wildcards and stemming support may be limited.

We would welcome input from readers on their practical experience with Exchange 2010 searches in the stressed and demanding environment of e-discovery.

... David Sengupta

One Comment

  1. Posted January 12, 2010 at 7:00 AM | Permalink

    Hi David,

    I think one of the other challenges to consider is how Exchange indexing will scale. Indexing at large scale (millions or billions of items) is a complex problem no matter who is doing the programming.

    In some sense, trying to do indexing in Exchange is like doing business intelligence / analytics in a production database. Some companies do it but you end up with a trade-off of search/analytics performance versus production app performance.

    I’d imagine it’s unacceptable for clients to have Exchange indexing slow down production mail delivery. So it will be interesting to see whether clients can tune Exchange to deliver both fast indexing/search and solid mail delivery.

    My suspicion is that, like in the database world, we’ll continue to see a separation of production data (Exchange) and the “data warehouse” (archives).

    Obviously I’m very biased. 🙂


Post a comment

You must be logged in to post a comment. To comment, first join our community.