Built on our multi-node broadcast, but now we're handling network partitions. Instead of getting fancy with message queues and retry logic, I tried a simple solution and it just works.
just periodically tell everyone everything you know.
Each node syncs its entire message set every 500ms
No tracking failed messages
No complex retry logic
Just keep sharing until everyone has everything
Every 500ms, each node:
Grabs its current message set
Sends it to every other node
and
as before, we're using a map for O(1) duplicate detection.
Simple Gossip Protocol
Self-Healing: Nodes automatically catch up after partitions heal
Eventually Consistent: Messages reach everyone... eventually
Zero Tracking: No need to remember what failed or succeeded
Pros:
Simple
Self-healing by design
No complex state tracking
Easy to reason about
Cons:
More network traffic than necessary
Repeated sending of old messages
Not super efficient in space/time, as we're sending everything we know over network
Next up: 3d/e will make us actually think about efficiency... but for now, this is fine.