1277382186|%e %B %Y, %H:%M - by martin_sustrik
The pre-0MQ version used a disk based queue, which we put in place early on in the dev process, but always planned to replace. Our early focus was on getting as much functionality in place as quickly as possible so we could get some experience with the system as a whole, and identify any high level issues.
Obviously, replacing a bunch of disk writes and reads with 0MQ frees up a lot of resources on the box, and in the apps themselves. So, the baseline was fairly horrendous, and almost anything would have been a step forward :-)
Having said that, we chose 0MQ because it was clearly the fastest and most lightweight queue we evaluated. As they say in Formula One, its easier to make a fast car reliable than it is to make a reliable car fast. In our case, reliable means surviving app or box failures, rather than data loss (which I haven't seen in steady state running). We're assuming we can figure out how to deal with these problems sometime in the future.
As far as what "noticeably snappier" means exactly, I don't have hard numbers, but the boxes went from about 10% idle to about 90% idle after the 0MQ deployment. Since the primary app that 0MQ is feeding is Solr, and that already does a pretty good job of divorcing search and indexing, we didn't see huge improvements in "smaller" searches (which were already in the <100ms range), but the worst case searches do a lot better now (> 1 second down to a few hundred milliseconds), and this has flattened the overall search latency very nicely. This also means we can push more data through the boxes than we could before, which we like a lot.