Exchange 2010: Exciting Message Store Performance and Redundancy Improvements

Maintaining the integrity of, and timely access to, a user's message store has been a major dilemma for Exchange over the years. Exchange lagged badly behind Lotus Notes/Domino because:

  • Lotus Notes employed a single OS file to hold a single user's mail file, while Exchange used a single "storage group" to hold multiple users' messages. This had the effect that a disk error affected many more users, and a restore from backup took much longer.
  • Exchange was much more disk I/O intensive, which meant that the likelihood of a disk error was increased.
  • Lotus Notes offered "shared nothing" multisite clustering, while Exchange only offered backup and restore.

This was largely a legacy of the environment for which Exchange was designed. The power of servers and the size of disks in the late 1980s meant that Exchange, like Notes in the early 1980s, was originally designed to support relatively small numbers of users per server, each with a relatively small message store. The effect of Moore's Law on both CPU power and RAM size, along with a 120% compound growth rate in disk bit density/cost, has meant that hardware is no longer a limiting factor. A single server is conceptually able to support many thousands of users, and disks many gigabytes of messages.

Unfortunately until its 2007 release, Exchange was largely hamstrung in its support of bigger servers and larger and cheaper disks. The problem was that Exchange was disk I/O limited. With the leap to "64-bit only" support in Exchange 2007, which resulted in a 60-70% reduction in disk I/O, and now with an additional 70% reduction in disk I/O under Exchange 2010, Exchange has finally broken free of its disk I/O straitjacket. In addition to increasing the number of users that can be supported per server, this reduction in disk I/O has the following knock on effects:

  • Exchange 2010 is able to employ much cheaper EDI (SATA) as opposed to SCSI and Fiber Channel disks. EDI disks trade-off an order of magnitude (1TB, SATA II @ $105 vs. 300GB, ULTRA SCSI @ $350) improvement in per-bit cost against poorer performance (7.2k RPM SATA II vs. 15k RPM, ULTRA SCSI) and reliability.
  • Exchange 2010 is able to support "shared nothing" replication, via "log replay," of mailbox databases both within and between data centers for load balancing and much more rapid post-disaster availability.

Collectively these changes deliver the following benefits:

  • They allow a backup/restore approach to disaster recovery to be replaced by a multicopy/multisite/online approach to disaster recovery.
    • Eliminating backup/restore enables much bigger mailboxes/message stores.
    • Supporting multicopy/multisite/online redundancy delivers much more rapid "disaster recovery."
  • They allow a switch to SATA drives. This delivers:
    • A near four-fold increase in mailbox/message store quotas/sizes.
    • A near four-fold increase in multisite data redundancy.
    • A near four-fold increase in disk density.
    • All at no increase in cost!
  • The multicopy/online approach to redundancy eliminates the need for RAID(1 or 5)-delivered redundancy. This is good news, because:
    • While theoretically, RAID-delivered redundancy should increase reliability, in practice, buggy RAID controller firmware and drivers often detect bogus errors requiring at best operator intervention, and at worst performance-sapping rebuilds, and even reboots!

Ferris is excited about these developments in Exchange 2010. They deliver considerable new benefits to the Exchange value proposition. At the same time, achieving the benefits requires a fairly major change of mindset among Exchange deployment architects. This is especially the case when replacing expensive SAN-based disk drives and SNAP-based backups with directly attached SATA disk drives and multisite replication.

... Nick Shelness

Post a comment

You must be logged in to post a comment. To comment, first join our community.