This morning, I ran into an interesting issue related to small replica sets in MongoDB.
The recommended configuration for a replica set in MongoDB is 3 instances: 1 primary and 2 secondary instances. This setup has many benefits for durability. It has the added benefit for electing a primary. Why is this important? I’m glad you asked. I ran into this issue today and it took a while to figure out.
My setup: two mongod instances in my replica set (rep1-1 and rep1-2). In my case, rep1-1 was the primary and rep1-2 was the secondary. Due to an inordinate amount of internal fragmentation, I wanted to rebuild the data files. Normally to do this, I simply bring down rep1-2 (the secondary), delete its data files, and bring it back up. It then magically syncs itself with the primary, and in the process it rebuilds its data files without fragmentation. This works great, I’ve done it several times before.
But this time, I needed to do it live. I didn’t want my replica set to go down while I rebuilt the secondary. In theory, I would be able to take down the secondary and the master would still happily plug along, and the secondary would just catch up when it comes back online. But to my surprise, this isn’t what happened.
When I brought the secondary down, the primary unelected itself as primary and became a secondary. I now had no primary! Yikes! Thankfully, I noticed right away and brought the secondary back online. A new election occurred and rep1-1 was primary again.
What happend?! It turns out that when a mongod instance is isolated, it cannot vote for itself to be primary. This makes sense when you think about it. If a network link went down and separated your two replicas, you wouldn’t want them both to elect themselves as primary. So in my case, when rep1-1 noticed that it was isolated from the rest of the replica set, it made itself secondary and stopped accepting writes.
Enter arbiters.
A MongoDB arbiter is a separate mongod instance (just like any other) except it has no data. It exists solely to participate in elections and break ties. This is a great solution to the problem of ties, especially when there are a small number of replicas in your set, as in my case.
So I started up an arbiter and added it to the replica set. Turns out, adding an arbiter is ridiculously easy. Now I had two mongod instances and one arbiter.
When I brought down the secondary, a new election occurred between the arbiter and my primary. The primary stayed the primary and I was able to continue rebuilding the secondary without interruption of service.
Thanks to those who chimed in on the MongoDB mailing list to help me solve this. In case you’re interested, check out the thread.