Scaling / Fleets / Many Nodes

The pain

ROS 2 creates many participants and many internal topics per node, so a fleet or a large single robot multiplies DDS’s discovery and matching cost (16 reports). Failures show up as:

A Discovery Server becoming unresponsive past a few hundred participants.
Memory blowing up when many readers/writers match, or when distros mix.
Deadlocks when many readers and writers match under reliable TCP.
Open questions in the community about how many participants an RMW even allows.

Most recent example

autoware#6759 — “Fix [rmw_cyclonedds_cpp]: rmw_create_node: failed to create domain, error” (2026-01-24). A full self-driving stack hits domain/participant creation failures — scaling limits surfacing in one of the largest real ROS 2 deployments.

Reference list (most recent)

Date	Source	Problem
2026-01-24	autoware#6759	Participant/domain creation fails in a large stack
2025-09-09	ROS Discourse	How many participants does an RMW even allow?
2025-04-17	Fast-DDS#5767	Discovery Server unresponsive with many participants
2025-01-15	rmw_fastrtps#797	Cross-distro sub/pub exhausts all memory
2024-12-04	Fast-DDS#5235	Discovery Server deadlock with many matching endpoints

How ZeroDDS solves it

No central server to overload, bounded peer state, and measured all-to-all discovery.

No Discovery Server bottleneck. Multicast-free discovery is peer-to-peer unicast — there is no single server process to become unresponsive or deadlock at scale (Fast-DDS#5767, #5235).
Bounded, explicit peer state. ZERODDS_MAX_PEER_PARTICIPANTS caps how many participants are expanded per peer, so discovery state is bounded and predictable rather than open-ended.
Measured all-to-all discovery. The scaling harness (ZERODDS_SCALE_N) brings up all-to-all, multicast-free meshes: ~50 participants in ~2.9 s, 100 in ~19.9 s. These are honest current numbers on a single host — the point is the curve is measured and the mechanism (unicast, no server) has no central choke point.
Memory-safe matching. The cross-distro “exhausts all memory” class (rmw_fastrtps#797) comes from unbounded growth on malformed/mismatched discovery; ZeroDDS parses with explicit bounds and DoS caps.

Why it no longer has to be a pain

Scaling pain concentrates at the Discovery Server and at unbounded discovery state. ZeroDDS removes the central server (peer-to-peer unicast) and bounds peer expansion explicitly, so adding robots adds linear, local unicast cost instead of loading a shared choke point toward a cliff.

Honest status: large-fleet (hundreds of real nodes) numbers are still being gathered. The single-host all-to-all curve above is verified; we want community runs on real fleets — see Validate it yourself.

Reproduce it yourself

# All-to-all, multicast-free, N participants:
ZERODDS_SCALE_N=50 <scaling harness>     # ~2.9 s
ZERODDS_SCALE_N=100 <scaling harness>    # ~19.9 s

→ Back to overview · Next: Docker / cloud

Scaling / Flotten / viele Nodes

← Zurück zur Übersicht

Der Schmerz

ROS 2 erzeugt viele Participants und viele interne Topics je Node, sodass eine Flotte oder ein großer Einzelroboter die Discovery- und Matching-Kosten von DDS vervielfacht (16 Reports). Fehler zeigen sich als:

Ein Discovery-Server, der jenseits weniger hundert Participants unresponsiv wird.
Speicher, der explodiert, wenn viele Reader/Writer matchen, oder wenn Distros mischen.
Deadlocks, wenn viele Reader und Writer unter reliable TCP matchen.
Offene Fragen in der Community, wie viele Participants eine RMW überhaupt erlaubt.

Jüngstes Beispiel

autoware#6759 — „Fix [rmw_cyclonedds_cpp]: rmw_create_node: failed to create domain, error” (2026-01-24). Ein voller Self-Driving-Stack trifft auf Domain-/Participant-Creation-Fehler — Scaling-Limits, die in einem der größten realen ROS-2-Deployments auftauchen.

Referenzliste (jüngste zuerst)

Datum	Quelle	Problem
2026-01-24	autoware#6759	Participant-/Domain-Creation scheitert in großem Stack
2025-09-09	ROS Discourse	Wie viele Participants erlaubt eine RMW überhaupt?
2025-04-17	Fast-DDS#5767	Discovery-Server unresponsiv mit vielen Participants
2025-01-15	rmw_fastrtps#797	Cross-Distro-Sub/Pub erschöpft den gesamten Speicher
2024-12-04	Fast-DDS#5235	Discovery-Server-Deadlock mit vielen matchenden Endpoints

Wie ZeroDDS es löst

Kein zentraler Server zum Überlasten, gebundener Peer-State und gemessene All-to-all-Discovery.

Kein Discovery-Server-Bottleneck. Multicast-freie Discovery ist Peer-to-Peer-Unicast — es gibt keinen einzelnen Server-Prozess, der bei Skalierung unresponsiv werden oder deadlocken könnte (Fast-DDS#5767, #5235).
Gebundener, expliziter Peer-State. ZERODDS_MAX_PEER_PARTICIPANTS capt, wie viele Participants pro Peer expandiert werden, sodass Discovery-State gebunden und vorhersehbar ist statt open-ended.
Gemessene All-to-all-Discovery. Der Scaling-Harness (ZERODDS_SCALE_N) bringt all-to-all, multicast-freie Meshes hoch: ~50 Participants in ~2,9 s, 100 in ~19,9 s. Das sind ehrliche aktuelle Zahlen auf einem einzelnen Host — der Punkt ist, dass die Kurve gemessen ist und der Mechanismus (Unicast, kein Server) keinen zentralen Choke-Point hat.
Memory-safe Matching. Die Cross-Distro-„erschöpft den gesamten Speicher”-Klasse (rmw_fastrtps#797) kommt von ungebundenem Wachstum bei fehlerhafter/mismatchter Discovery; ZeroDDS parst mit expliziten Bounds und DoS-Caps.

Warum es kein Schmerz mehr sein muss

Scaling-Schmerz konzentriert sich auf den Discovery-Server und auf ungebundenen Discovery-State. ZeroDDS entfernt den zentralen Server (Peer-to-Peer-Unicast) und bindet Peer-Expansion explizit, sodass das Hinzufügen von Robotern lineare, lokale Unicast-Kosten addiert, statt einen geteilten Choke-Point Richtung Klippe zu laden.

Ehrlicher Status: Large-Fleet-Zahlen (hunderte echte Nodes) werden noch gesammelt. Die Single-Host-All-to-all-Kurve oben ist verifiziert; wir wollen Community-Runs auf echten Flotten — siehe Selbst validieren.

Selbst reproduzieren

# All-to-all, multicast-frei, N Participants:
ZERODDS_SCALE_N=50 <scaling harness>     # ~2,9 s
ZERODDS_SCALE_N=100 <scaling harness>    # ~19,9 s

→ Zurück zur Übersicht · Weiter: Docker / Cloud