The other day, I wanted to discover how in-sync my secondary replicas were.  From what I’ve read, replicas could be seconds behind the master, but it’s possible for them to be further behind.  This has been kind of a vague notion in my mind.  How far behind could they be in a normal situation?  Minutes?  Hours?!  I really wanted to know for my own setup so I could know how much data loss there would be if my primary died.

Turns out, the master’s local db has a ‘slaves’ collection that contains this information.  On the master, you can run this:

> use local
> db.slaves.find()[/bash]

Which returns something like this:

{    "_id" : ObjectId("4cc9bd23c30b25792eb104bf"),
    "host" : "10.4.1.3",
      "ns" : "local.oplog.rs",
"syncedTo" : { "t" : 1289043070000, "i" : 456 } },

{    "_id" : ObjectId("4cd41b0e00ccdeff9cae7389"),
    "host" : "10.4.1.4",
      "ns" : "local.oplog.rs",
"syncedTo" : { "t" : 1289043070000, "i" : 11575 } }

Let me break this down for you:

  • host - the secondary replica hostname

  • ns (or ‘namespace’) - that replica’s oplog collection name

  • syncedTo - the point in time when this replica last updated

    • t - a 64-bit integer timestamp in microseconds of the last update
    • i - a counter for the op number at this timestamp

If you look at your oplog, you’ll notice each op has a similar timestamp.  Using the syncedTo value, you can see exactly which operations have and have not been applied to your secondary replica.

Now, for the fun part.  I wrote a simple Ruby script that checks the status of all my replicas against the master.  It outputs something like this:

$ ./replica-oplog-status.rb
10.4.1.3 is 0 seconds beind master
     and 21 ops
10.4.1.4 is 0 seconds beind master
     and 44 ops

I was very surprised to discover that my replicas typically were in sync in the sub-second range. In fact, I had to run this script several times in a row before I saw them out of sync at all. Very impressive, and not what I was expecting to see.

If you’re interested, here’s the Ruby script. This is to be run on the primary’s server:

#!/usr/bin/ruby
require 'rubygems'
require 'mongo'

mongo = Mongo::Connection.new('localhost', 27018)
db = mongo.db('local')
slaves = db.collection('slaves').find().to_a
last_op = db.collection('oplog.rs').find.sort([['$natural',-1]]).limit(1).to_a[0]

slaves.each do |slave|
  opdiff = last_op['ts'][0] - slave['syncedTo'][0]
  diff = (last_op['ts'][1] - slave['syncedTo'][1])/1000.0
  puts "#{slave['host']} is #{diff} seconds beind master"
  if diff == 0
    puts "     and #{opdiff} ops\n"
  end
end