ZeroMQ an introduction - 23 Jun 2010 09:27 - by martin_sustrik - Comments: 1

Nicholas Piël wrote a great blog introducing readers to the basic concepts of ZeroMQ.

ZeroMQ is a messaging library, which allows you to design a complex communication system without much effort. It has been wrestling with how to effectively describe itself in the recent years. In the beginning it was introduced as ‘messaging middleware’ later they moved to ‘TCP on steroids’ and right now it is a ‘new layer on the networking stack’.

I had some trouble understanding ZeroMQ at first and really had to reset my brain. First of all, it is not a complete messaging system such as RabbitMQ or ActiveMQ. I know the guys of Linden Research compared them, but it is apples and oranges. A full flexed messaging system gives you an out of the box experience. Unwrap it, configure it, start it up and you’re good to go.

ZeroMQ is not such a system at all; it is a messaging library to be used programmatically. It basically gives you a pimped socket interface allowing you to quickly build your own messaging system.


Mongrel2 Is "Self-Hosting" - 18 Jun 2010 10:15 - by martin_sustrik - Comments: 0

Zed Shaw has an interesting blog on Mongrel2 - a web server with ØMQ at the backend:

"It may not dawn on you quite yet, but I think this design has a very good chance of changing how we architect and deploy web applications. Imagine if you had available to you reliable asynchronous and synchronous web protocols, and that all of the messages from those protocols were transported to your backends with no regard for what language they were written in? Not only that but it wouldn't even care how many backends you had or where they were located."

Read more here: [http://sheddingbikes.com/posts/1276761301.html]

Internet Worldview in Messaging World - 08 Jun 2010 21:25 - by martin_sustrik - Comments: 0

A short story explaining what "messaging Internet-style" means can be found here:


Berlin Buzzwords 2010 - 07 Jun 2010 16:13 - by pieterh - Comments: 1

ØMQ gets a keynote slot tomorrow morning at the Berlin Buzzwords 2010 conference. About 300 FOSS developers and users are here at the Kosmos theatre in Berlin, listening to many speakers talk about "search, store, and scale".

Pieter Hintjens is giving this talk. There are no slides but the main message of the talk will be that if you're using sockets in your distributed applications, or if you're running across any more than one core or box, you should be looking at ØMQ.

We'll find the video of the talk when it's uploaded.

Loggly Switches to ØMQ - 05 Jun 2010 10:24 - by pieterh - Comments: 4


Loggly is a San Francisco startup providing log file management for cloud services. "Our goal is to build a highly scalable log management service which provides great value and is fast, fun and easy to use." Loggly has just switched to a ØMQ backend and report that "things are noticeably snappier."

ØMQ/2.0.7 (beta) released - 04 Jun 2010 18:05 - by mato - Comments: 12

I'm happy to announce the release of ØMQ version 2.0.7.

The new version is available immediately to download on the website, at:


Please note that due to incompatible API and ABI changes in this release, all language bindings will need to be updated to work with ØMQ 2.0.7. As these are maintained by the community it may take a few days for everyone to catch up.

A big thank you to all our contributors, and special thanks to Martin Sustrik for putting it all together!

Highlights of the 2.0.7 release:


  • The core documentation has been updated with many clarifications, especially in the description of the functionality provided by the different socket types.
  • The version of OpenPGM bundled with 0MQ has been updated to the 2.1.26 release.


  • GCC-isms have been removed from the code and build system across the board; ØMQ should now build with no issues when using compilers other than GCC.


  • The zmq_init() function now has only a single parameter; the number of ØMQ I/O threads to create in the context being initialised. The app_threads and flags parameters have been removed.
  • The ZMQ_P2P socket type has been renamed to ZMQ_PAIR.
  • The ZMQ_LWM socket option has been removed; the low water mark for a socket is now computed automatically by ØMQ.
  • A zmq_getsockopt() function has been added.

New functionality

  • Multi-hop request/reply is fully supported. This feature allows the insertion of device(s) between ZMQ_REQ and ZMQ_REP sockets thus enabling scenarios such as multi-threaded server, shared service queue, and other interesting messaging topologies. The entire infrastructure is transparent to applications.
  • Multi-part messages. A ØMQ message may now be composed of 1 or more message parts; each message part is an independent zmq_msg_t in its own right. ØMQ ensures atomic delivery of messages; peers shall receive either all message parts of a message or none at all. This feature allows for seamless zero-copy message passing when data are scattered in memory, and is an important building block for multi-hop messaging topologies.
  • Context termination and ETERM. The zmq_term() function has been changed to interrupt any blocking operations on open sockets, causing them to return the newly defined ETERM error code. This allows for orderly application termination, especially when multiple application threads are involved.

As always, a full list of changes may be found in the ChangeLog included in the distribution tarball, or in Git.


Building ØMQ and pyzmq on Red Hat - 24 May 2010 09:37 - by martin_sustrik - Comments: 2

Quite a few people have run into issues with building ØMQ on stable distributions of Linux, such as RHEL or CentOS. Given that build tools shipped with such distributions are stable (=old) they fail to build ØMQ out of the box. Blog by Ryan Duffield explains how to deal with the issue.

Read more....

To Trie or not to Trie - 18 May 2010 06:30 - by martin_sustrik - Comments: 0

Have you ever thought about what exactly happens when you subscribe for a topic using SUB socket? The messages have to be checked and those not matching the subscription(s) have to be dropped.

Currently, matching is done using an N-ary search tree, a structure known as trie.

Searching a trie is very fast. The search time is independent of overall number of topics or number of subscriptions. It's linearly dependent only on the length of the topic string in the message being matched. However, with large number of subscriptions the trie can consume more memory than alternative search structures such as hash table. Larger memory usage means that memory cache misses happen more often and slow the algorithm down.

Bhavin Turakhia did a research on trie optimisation and blogged about it.

Read more....

ØMQ/2.0.6 on OpenVMS - 12 May 2010 13:02 - by martin_sustrik - Comments: 3

BC&JA are pleased to announce that a binary version of 0MQ V2.0.6 has been released for Alpha and Integrity, OpenVMS 8.3 and higher…

Read more....

Zero-copy and Multi-part Messages - 08 May 2010 09:58 - by martin_sustrik - Comments: 17

In high performance networking copying data is considered harmful to performance and avoided as much as possible. The technique of avoiding all the copies is known as "zero-copy".

This article demonstrates the impact of single copy of the data on latency. It shows, for example, that for 256MB of data, single copy can increase latency by 0.1 second!

Obviously, data are copied from memory to network interface card and vice versa, they are copied on user space/kernel space boundary etc. This article in Linux Journal gives detailed explanation of what's going on under the hood of the operating system and what are the ways to get as close to the zero-copy as possible.

However, in this blog we are going to discuss only a single instance of copying the data, namely copying user data into ØMQ messages.

Consider the following example. We'll create a message million bytes long and copy the user data into it before sending:

zmq_msg_t msg;
zmq_msg_init_size (&msg, 1000000);
memcpy (zmq_msg_data (&msg), buffer, 1000000);
zmq_send (s, &msg, 0);

The memcpy part looks suspicious. We have the data in the buffer already so why not send the buffer itself instead of copying it to the message? Is ØMQ capable of such thing?

Actually, yes. It is and it has always been. All we have to do is to define deallocation function for the buffer and pass it to ØMQ along with the buffer:

void my_free (void *data, void *hint)
    //  We've allocated the buffer using malloc and
    //  at this point we deallocate it using free.
    free (data);

Once the deallocation function is defined we can create a "zero-copy" message and pass it the buffer and deallocation function:

zmq_msg_t msg;
void *hint = NULL;
zmq_msg_init_data (&msg, buffer, 1000000, my_free, hint);
zmq_send (s, &msg, 0);

Note that the buffer is now owned by the message. It will be deallocated once the message is sent. We must not deallocate the buffer ourselves!

Also note the hint parameter. It can be used if more complex allocation mechanism is used. Say we allocated the chunk using some "allocator" object and we have to deallocate it via the same object. In such case we can pass the pointer to allocator as a hint to zmq_msg_init_data and modify the deallocation function as follows:

void my_free (void *data, void *hint)
    ((allocator_t*) hint)->free (data);

We've got rid of the copying, right?

Well, not entirely. In some cases the above may work. In other cases it is insufficient.

Consider the case when we have two large matrices — each 100MB long — which we want to transfer. Unfortunately they are not contiguous in the memory. Each was allocated using separate malloc invocation and thus we cannot describe both using single data pointer.

Why not send them as two separate messages then? Consider say REQ socket. It load balances messages. In other words, if there are two REP sockets connected to it, sending two messages would result in first matrix being dispatched to one REP socket while the second to the other REP socket. This is not what we want. We want the two matrices to form an atomic unit of transfer. They should never be split apart.

It seems that in this case we need something equivalent to POSIX gather arrays. For those unfamiliar with Berkeley socket API, gather array is an array of data chunks that's sent to the networking stack using a single call.

But would that account for all possible scenarios?

There's still a scenario where it won't help. namely, when the two matrices don't exist at the same time. First one is created, sent and deallocated, then second one. In such case the gather array would be of no use. There's no single point in time when we own all the data and thus are able to fill in the gather array.

The new feature in ØMQ called "multi-part message" solves the problem. To put it simply, it allows you to concatenate multiple messages into a single message:

zmq_msg_t msg1;
zmq_msg_init_data (&msg1, matrix1, matrix1_size, my_free, NULL);
zmq_send (s, &msg, ZMQ_SNDMORE);
zmq_msg_t msg2;
zmq_msg_init_data (&msg2, matrix2, matrix2_size, my_free, NULL);
zmq_send (s, &msg, 0);

It looks almost exactly as if you were sending two separate messages except for passing ZMQ_SNDMORE flag to the first send. The flag says: "Hold on! There are more data going to be added to this message!"

The important point to note is that although all parts of the message are treated as a single atomic unit of transfer, the boundaries between message parts are strictly preserved. In other words, if you send a message consisting of two message parts, each 100 bytes long, on the other side you'll never receive a single message part 200 bytes long. Or two message parts, 50 and 150 bytes long. Or even four message parts, each 50 bytes long. You'll get exactly what you've sent — two message parts, each 100 bytes long in the same order as they were sent.

This fact allows for using multi-part messages for adding coarse-grained structure to your message. The example with two matrices illustrates the point. You send the two matrices as two message parts and thus avoid the copy. However, at the same time the matrices are cleanly separated, each residing in its own message part and you are guaranteed that the separation will be preserved even on the receiving side. Consequently you don't have to put matrix size into the message or invent any kind of "matrix delimiters".

Another interesting use of multi-part messages is to combine them with PUB/SUB sockets. Publish/subscribe messaging pattern allows for subscribing for particular subset of messages. Subscription is a chunk of data supplied by receiver, saying "please, send me all the messages beginning with these data":

zmq_setsockopt (s, ZMQ_SUBSCRIBE, "ABC", 3);

Obviously, sender has to place the appropriate data at the beginning of the message to make it delivered to the specific subscriber:

zmq_msg_t msg;
zmq_msg_init_size (&msg, 6);
memcpy (zmq_msg_data (&msg), "ABCxyz", 6);

The part of the message that is checked against the subscriptions is called topic. In our case the topic is "ABC".

When the topic is of variable length you need a delimiter to separate is from the rest of the message, so that subscription mechanism doesn't incidentally consider beginning of the data to be a continuation of the topic. Following example uses "pipe" symbol as delimiter:

zmq_msg_t msg;
zmq_msg_init_size (&msg, 7);
memcpy (zmq_msg_data (&msg), "ABC|xyz", 7);

While this works, it's a bit ugly. Even more importantly, if the topic happens to be binary data, there's no spare symbol we can use as the delimiter.

Elegant solution is to use a two-part message. Subscriptions are always evaluated only against the first message part, so we can place the topic into the first message part while the rest of the data into the second one (or even into several subsequent message parts):

zmq_msg_t topic;
zmq_msg_init_size (&topic, 3);
memcpy (zmq_msg_data (&topic, "ABC", 3);
zmq_send (s, &topic, ZMQ_SNDMORE);
zmq_msg_t value;
zmq_msg_init_size (&value, 3);
memcpy (zmq_msg_data (&value, "xyz", 3);
zmq_send (s, &value, 0);

One final remark. When receiving a message you may know that each message consists of two parts, say "topic" and "value". However, in other scenarios you may have no idea how many message parts there are in the message. In such case ØMQ allows you to ask the socket whether there are more message parts to be received or not. This is done using ZMQ_RCVMORE socket option:

zmq_recv (s, &msg, 0);
int64_t more;
size_t more_size = sizeof (more);
zmq_getsockopt (s, ZMQ_RCVMORE, &more, &more_size);
if (more) ...

