Fast Zero-Copy Messaging with ROS 2 Intra-Process Communication

ROS 2 uses so-called “middleware” to perform communication between nodes. Over the years different middleware vendors have vied to become the Chosen Implementation, importantly by focusing on message passing efficiency. However:

You can be even more efficient by just skipping the middleware completely and achieve true zero-copy messaging!

And all you have to do is apply a simple convention:

All publishers SHOULD allocate messages using unique pointers.
All publishers SHOULD publish the messages using std::move.
All subscribers SHOULD expect either a ConstSharedPtr or a const reference.
Enable use_intra_process_comms

Read on to understand why.

What is Middleware?

ROS 2 was designed to abstract away the low-level communication framework into a concept referred to as “middleware”. This way the ecosystem is not stuck to a single communication framework that is hard to optimise for all users, use cases, hardware and infrastructure. There is now an array of different middleware implementations that one can choose from.

Originally, the main choices were based on DDS, the ‘Data Distribution Service’. DDS has historically been, and still is, used in a wide range of different industries, and targeting it indicated an important step towards making ROS 2 an industry ready platform. However, DDS is designed as a machine-to-machine communication standard, whereas in ROS 2 most communication happens firstly on a single machine on the robot, between nodes.

Other, non-DDS based middlewares have also been developed. The most popular alternative currently is Zenoh. These resolve some of the communication issues that the ROS 2 user base struggled with, and may be relatively lightweight, however the middleware still induces some additional overhead. So is the middleware really always needed?

What if you could just bypass the middleware altogether?

It turns out that you can, using Intra-Process Communication.

What is Intra-Process Communication?

It is common for each node in a ROS 2 solution to run in a separate process: often each node gets compiled to its own executable, and each one gets run as a separate process. When nodes communicate in this scenario, they perform inter-process communication, i.e. communication between multiple running processes, which is handled by the middleware

However, you can also compile multiple nodes into a single executable (static composition) and run them in the same process, possibly in parallel. Or, alternatively, you can use component containers to load multiple nodes into the same process dynamically (dynamic composition). When those nodes talk to each other they perform intra-process communication, i.e. communication within a single process.

By default such intra-process communication still goes through the middleware. That middleware may use efficient methods to pass messages in the same process, such as using shared memory. But there is always some overhead, often involving at least the copying of the messages from ROS to the middleware and back.

That is why ROS 2 has a special built-in intra-process communication mode, in which messages do not unnecessarily cross the ROS (rclcpp) <=> middleware boundary.

It does not come enabled out of the box, though.

And you have to adopt some (simple) standard practices to actually make it beneficial.

Let’s see how one does that.

Enabling Intra-Process Communication

Preparing your Node

The constructor of the rclcpp::Node class takes an options argument of type NodeOptions:

explicit Node(
  const std::string &node_name,
  const NodeOptions &options = NodeOptions()
)

This class has an option called use_intra_process_comms that defaults to false. All we have to do is set that to true to enable intra-process communication.

In the official ROS 2 tutorial on intra-process communication they do this in the member initialization list of the custom node:

namespace foo
{
  struct MyNode : public rclcpp::Node
  {
    MyNode(const std::string & name)
    : Node(name, rclcpp::NodeOptions().use_intra_process_comms(true))
    {
      ...
    }
  };
}

A nicer pattern however is to give your node’s constructor an options argument itself and pass that on:

namespace foo
{
  struct MyNode : public rclcpp::Node
  {
    MyNode(const std::string & name, const rclcpp::NodeOptions & options)
    : Node(name, options)
    {
      ...
    }
  };
}

This gives you more freedom to select whether to actually use this mode, and is actually a requirement for using your node in dynamic composition.

Enabling: Static vs Dynamic Composition

If you run your node with static composition, you set the option in your main function:

int main() {
  rclcpp::init(argc, argv);
  rclcpp::executors::SingleThreadedExecutor executor;
  auto options = rclcpp::NodeOptions().use_intra_process_comms(true);

  auto node1 = std::make_shared<foo::MyNode>("node1", options);
  auto node2 = std::make_shared<foo::MyNode>("node2", options);

  executor.add_node(node1);
  executor.add_node(node2);
  executor.spin();

  rclcpp::shutdown();

  return 0;
}

If instead you use dynamic composition with a component container, you set the option when loading the node as such:

ros2 component load /ComponentManager mypackage foo::MyNode \
     -e use_intra_process_comms:=true

And, hey presto, message passing now happens wholly inside the node, without interference of the middleware!

But, we may still not actually have zero-copy… For that we have to look deeper into what actually happens with a message when the publisher hands it off and the subscriber receives it.

Achieving Zero-Copy

What Doesn’t Work

In the context of ROS 2 communication, “zero copy” means that a publisher allocates memory for, fills, and publishes a message, after which a subscriber receives exactly that message without any copying of data happening.

However, if you follow the ROS 2 simple publisher and subscriber tutorial, this is not what happens. And it should not happen if you want a stable system. Why is that?

In that tutorial, the publisher creates and publishes a message like this:

auto message = std_msgs::msg::String();
message.data = "Hello, world! " + std::to_string(this->count_++);
this->publisher_->publish(message);

This will resolve to a version of rclcpp::Publisher::publish that takes a reference to the message.

The subscription on the other hand looks like this:

auto topic_callback =
  [this](std_msgs::msg::String::UniquePtr msg) -> void {
    RCLCPP_INFO(this->get_logger(), "I heard: '%s'", msg->data.c_str());
  };
subscription_ =
  this->create_subscription<std_msgs::msg::String>("topic", 10, topic_callback);

So a unique pointer to the message is expected.

How could ROS 2 make zero-copy message passing happen in this case?

In theory the unique pointer can be constructed with std::unique_ptr<std_msgs::msg::String>{&message}. However, sticking a pointer to a statically allocated object into a unique pointer meant to manage dynamically allocated memory is a big no-no. When the original message goes out of scope, the unique pointer is now dangling and the subscriber will be subject to undefined behaviour. Or, very likely, segmentation faults.

Taking an argument as a unique pointer means that a function takes sole ownership of the object. The only way to achieve that is for ROS/rclcpp to create a new copy of the object and move ownership of that copy into the function. Very simplified:

void publish(const Message & msg) {
    auto msg_copy = std::make_unique<Message>(msg);  // This copies
    subscription_callback(std::move(msg_copy));
}

Would it help if the subscriber uses a different type for its callback argument? In this case it would not. The problem is with the publisher: there is no safe way to pass the message without making a copy. You may say that if the subscriber would use a const reference as well then it is in theory possible to pass it on without making a copy. However, if the subscriber wants to store it without making a copy, you get similar problems with possibly dangling references or pointers.

So we must get the publisher to give up ownership…

What Does Work

Modern C++ has established ownership semantics, in the form of unique and shared smart pointers and the std::move utility¹. The canonical way of using this to achieve zero-copy transfer is to create a message as a unique pointer and “move” it into the publisher:

auto message = std::make_unique<std_msgs::msg::String>();
message->data = "Hello, world! " + std::to_string(this->count_++);
this->publisher_->publish(std::move(message));

Now, we do not have to worry anymore about the publishing node altering and/or deallocating the message out from under us; it has given up ownership and cannot access it anymore. This is the core semantic operation that enables the message to be passed on to the subscriber without making any copies.

How can we verify this actually happens? The common way is to keep track of the memory address of the message: if no copy is made then the subscriber should see the message at exactly the same memory address as where the publisher allocated it. I created a small repository that does this to test for zero-copy behaviour with a matrix of different types used by the publisher and the subscriber. It creates one publisher (talker) and two subscribers (listeners), and tests for each listener whether they get the message at the same memory address as where the publisher allocated it.

This results in the following overview:

Intra-Process	Talker Mode	Listener Mode	Listener 1	Listener 2
✅ on	UniquePtr	SharedPtr	❌	✅
✅ on	UniquePtr	ConstSharedPtr	✅	✅
✅ on	UniquePtr	UniquePtr	❌	✅
✅ on	UniquePtr	Const Reference	✅	✅
✅ on	UniquePtr	value	❌	❌
✅ on	Const Reference	*	❌	❌
❌ off	*	*	❌	❌

This tells us that:

when intra-process communication is not enabled, there is no zero-copy happening for any combination (obviously);
also, as discussed above, when the message is published by reference, zero-copy never happens regardless of subscription method;
zero-copy message passing only happens when the talker publishes unique pointers;
zero-copy message passing happens for both listeners when they subscribe by ConstSharedPtr or const &, but only for one of them when they use non-const SharedPtr or a UniquePtr.

The last point is interesting: why does only one of the listeners receive the original message? And can we predict what happens for more than 2 listeners?

It is all about ownership again, and preventing unexpected things to happen to a message while a subscriber is dealing with the message.

In the case of subscribing with a non-const shared pointer, the ownership of the message would be shared between the listeners, but each would be able to modify the message that is managed by the shared pointer. This could result in surprising behaviour and likely race conditions, where listener 1 could change any of the fields of the message while listener 2 is still processing it. That is why ROS 2/rclcpp passes the original message on to only one of them, and makes a copy for the other, so they can both do what they want with “their” copy without affecting the other.

On the other hand, subscribing with a ConstSharedPtr does work because the const-ness prevents this interaction between multiple subscribers, by preventing any attempt of modifications at compile time².

The case of subscribing with a unique pointer is a direct result of the ownership semantics of such a pointer: if exclusive ownership is given to one subscriber, there is no way to share it with another subscriber without making another copy of the message.

Due to these dynamics, we can extrapolate that a new copy will be made for each new subscriber for both cases, with only ever 1 receiving the message in a zero-copy manner.

Conclusion: Apply One Little Convention

Finally, with all this knowledge, we can derive the simple convention to apply to all our ROS 2 nodes that we staerted with:

All publishers SHOULD allocate messages using unique pointers.
All publishers SHOULD publish the messages using std::move.
All subscribers SHOULD expect either a ConstSharedPtr or a const reference.

This sets us up for fully enabling efficient zero-copy message passing.

Now, if you decide to compose nodes, either statically or dynamically, and enable intra-process communication, you automatically reap the full benefits.

And if you decide not to compose (yet), this convention still induces nice ownership semantics in your stack without hurting.

Discussion and Caveats

ConstSharedPtr or const reference?

My primary go-to rule for deciding whether a function (in this case a subscription callback) should take a (smart) pointer or a reference is nullability: is nullptr ever a valid value for your function? If so, use a pointer; if not: make the invalid impossible by using a reference.

In this case, ROS 2 will never give you a nullptr, so a second rule comes into play: does your callback take ownership of the message? In other words, will it store it for use outside of the lifetime of the callback? In that case you should also use a shared pointer, so you don’t incur a copy after all and (partly) undo the effort of setting up zero-copying.

SHOULD vs SHALL

In the statement of the convention above, I used “SHOULD” instead of “SHALL”, indicating that is a recommendation rather than a mandatory requirement to always adhere to. There may still be cases where some trade-off makes it more attractive to not allocate every message as a unique pointer. Especially if you know that you will never use composition and/or intra-process communication for a certain node/topic.

However, quite often you may find that such a trade-off is weaker than expected, which is why I still regard this as a strong recommendation. For instance, you may think that allocating a new message at every tick is too expensive. So you’d rather allocate it once and only update a subset of its fields before publishing it. Doing so you are very likely to create more copying overhead in the publisher, even when not using intra-process communication, which may outweigh the overhead you try to optimise away.

Still, each situation is unique. Profile your system, find the actual hot paths and bottlenecks, and disregard any convention if needed to optimise your system.

Should I Just Compose my Whole Stack?

If I managed to sell intra-process communication properly, you may now wonder why not just compose all your nodes into one big process, enable intra-process communication, and call it a day?

You could, but generally there are still strong reasons to isolate different parts of the stack. Most importantly, if one node fails and crashes, it brings down the complete stack. You don’t want a bug in the node that controls your robotaxi’s indicators to kill the do-not-run-over-people node…

So always consider your system’s failure modes, hazards and risks, concepts such as “Fail Safe” and “Fail Operational” (as you should in general when building a physical machine), and compose nodes that form a natural unit and where the efficiency benefits of fast zero-copy messaging are the most beneficial, or even crucial. Modern middleware implementations can already be very efficient, intra-process communication just can be even better in the places where it matters.

Should I Always Enable Intra-Process Communication?

Even when not composing everything as discussed above, should you still just adopt the convention to set use_intra_process_comms to true?

In general this probably does not hurt, and it may well save your future self an afternoon scratching their head, wondering why their composed nodes are still laggy, only to find out you forgot to set use_intra_process_comms (been there..). However, be aware that enabling this functionality comes with its own overhead. The intra-process machinery is run on top of the default middleware interfaces that ROS 2 still has to exercise to cater to possible subscribers that are not composed with the publisher. So if your node is not composed at all, enabling intra-process communication just adds an unnecessary cost.

As discussed earlier, it is better to always keep it configurable at runtime, and pick and choose based on your full architecture, and proper profiling.

Just don’t forget to set it when you want it…

Ownership and applying std::move to transfer it is not limited to smart pointers, but for this discussion they are the main targets. ↩︎
Note that in ROS 2, MyMessage::ConstSharedPtr is equivalent to std::shared_ptr<const MyMessage>, not const shared_ptr<MyMessage>. ↩︎

What is Middleware?#

What is Intra-Process Communication?#

Enabling Intra-Process Communication#

Preparing your Node#

Enabling: Static vs Dynamic Composition#

Achieving Zero-Copy#

What Doesn’t Work#

What Does Work#

Conclusion: Apply One Little Convention#

Discussion and Caveats#

ConstSharedPtr or const reference?#

SHOULD vs SHALL#

Should I Just Compose my Whole Stack?#

Should I Always Enable Intra-Process Communication?#