The other day, I wanted to discover how in-sync my secondary replicas were. From what I’ve read, replicas could be seconds behind the master, but it’s possible for them to be further behind. This has been kind of a vague notion in my mind. How far behind could they be in a normal situation? Minutes? Hours?! I really wanted to know for my own setup so I could know how much data loss there would be if my primary died.
Turns out, the master’s local db has a ‘slaves’ collection that contains this information. On the master, you can run this:
> use local
> db.slaves.find()[/bash]
Which returns something like this:
{ "_id" : ObjectId("4cc9bd23c30b25792eb104bf"),
"host" : "10.4.1.3",
"ns" : "local.oplog.rs",
"syncedTo" : { "t" : 1289043070000, "i" : 456 } },
{ "_id" : ObjectId("4cd41b0e00ccdeff9cae7389"),
"host" : "10.4.1.4",
"ns" : "local.oplog.rs",
"syncedTo" : { "t" : 1289043070000, "i" : 11575 } }
Let me break this down for you:
-
host - the secondary replica hostname
-
ns (or ‘namespace’) - that replica’s oplog collection name
-
syncedTo - the point in time when this replica last updated
- t - a 64-bit integer timestamp in microseconds of the last update
- i - a counter for the op number at this timestamp
If you look at your oplog, you’ll notice each op has a similar timestamp. Using the syncedTo value, you can see exactly which operations have and have not been applied to your secondary replica.
Now, for the fun part. I wrote a simple Ruby script that checks the status of all my replicas against the master. It outputs something like this:
$ ./replica-oplog-status.rb
10.4.1.3 is 0 seconds beind master
and 21 ops
10.4.1.4 is 0 seconds beind master
and 44 ops
I was very surprised to discover that my replicas typically were in sync in the sub-second range. In fact, I had to run this script several times in a row before I saw them out of sync at all. Very impressive, and not what I was expecting to see.
If you’re interested, here’s the Ruby script. This is to be run on the primary’s server:
#!/usr/bin/ruby
require 'rubygems'
require 'mongo'
mongo = Mongo::Connection.new('localhost', 27018)
db = mongo.db('local')
slaves = db.collection('slaves').find().to_a
last_op = db.collection('oplog.rs').find.sort([['$natural',-1]]).limit(1).to_a[0]
slaves.each do |slave|
opdiff = last_op['ts'][0] - slave['syncedTo'][0]
diff = (last_op['ts'][1] - slave['syncedTo'][1])/1000.0
puts "#{slave['host']} is #{diff} seconds beind master"
if diff == 0
puts " and #{opdiff} ops\n"
end
end