MongoDB has a feature that allows you to do an atomic upsert on a document.  An upsert can do any number of things from incrementing a counter, to adding new document fields, removing fields, etc.  These operations are useful to many applications but can come with performance and storage size side effects that should be taken into consideration when designing an application.

If you modify a document in such a way that it grows the document size, there is a chance that MongoDB will have to move the document to some other place in the file.  There is overhead involved in the move and Mongo may decide to grow the size of your data files to accommodate the move.  To account for this, Mongo maintains a heuristic on how often documents in each collection grow and then pads some extra space around the document in the data file so that the document can grow a bit and not have to be moved.  This heuristic is called the padding factor (or paddingFactor).  It is seen by viewing the stats on a collection.

> db.no_padding.stats()
{
        ...
        "paddingFactor" : 1.4099999999940787,
        ...
}

A paddingFactor of 1 means that mongod has not had to move any of your documents around in the data files.

Here’s a simple little test to see how much overhead can be involved with moves when I add new fields to a document.  I create two collections: one where the upsert adds new fields to a document and the other creates the documents with all the fields but initializes the counter fields to 0 so the upsert operation will modify the document in place.

var d = db.getSisterDB("padding_test");
d.no_padding.drop();
d.padding.drop();

var no_padding_f = function(count) {
    var start = Date.now();
    for (var i=0; i < count; i++) {
        // Document created with only the _id field
        d.no_padding.insert({_id:i});
        d.no_padding.update({_id:i}, {$inc : {"counter1": 1}}, true);
        d.no_padding.update({_id:i}, {$inc : {"counter2": 1}}, true);
        d.no_padding.update({_id:i}, {$inc : {"counter3": 1}}, true);
    }
    t = (Date.now() - start)/1000;
    print("no_padding_f runtime: " + t);
    return t;
}

var padding_f = function(count) {
    var start = Date.now();
    for (var i=0; i < count; i++) {
        // Document created with all the counter fields I
        // expect to use, each initialized to 0.
        d.padding.insert({_id:i, counter1: 0, counter2: 0, counter3: 0});
        d.padding.update({_id:i}, {$inc : {"counter1": 1}}, true);
        d.padding.update({_id:i}, {$inc : {"counter2": 1}}, true);
        d.padding.update({_id:i}, {$inc : {"counter3": 1}}, true);
    }
    t = (Date.now() - start)/1000;
    print("padding_f runtime: " + t);
    return t;
}

var t1 = no_padding_f(200000);
var t2 = padding_f(200000);
var faster = (1-(t2/t1))*100;
print("Padded is " + faster + "% faster\n");

print("storageSize with no padding  : " + d.no_padding.stats().storageSize);
print("paddingFactor with no padding: " + d.no_padding.stats().paddingFactor);
print("storageSize with padding     : " + d.padding.stats().storageSize);
print("paddingFactor with padding   : " + d.padding.stats().paddingFactor);

Here are the results when I run this on my local machine:

{nehresma@frodo:/tmp/mongodb-linux-x86_64-1.7.3/bin}$ ./mongo --quiet /tmp/script.js
no_padding_f runtime: 56.031
padding_f runtime: 42.165
Padded is 24.747015045242815% faster

storageSize with no padding  : 27136256
paddingFactor with no padding: 1.4099999999940787
storageSize with padding     : 17614336
paddingFactor with padding   : 1

Things to note about this simple demonstration:

  1. The moves can amount to a fair bit of additional storage space.  In this example it was an additional 35%.  Each situation varies and this number bounces around.  This extra storage can be reclaimed if you run a –repair on the database since a repair will compact the collection.

  2. The data set was sufficiently small and was kept in RAM by mongod.  This means that the timings of the two runs did not account for the additional overhead that may be needed for paging in the data files (a.k.a. disk IO). When disk IO is taken into account, moves become even more expensive.

  3. Be aware that $inc can change the BSON data type of a field from a 32bit to 64bit integer if you increment past 2^31 – see http://jira.mongodb.org/browse/SERVER-2005.

Planning how your application will do upserts and then adding space to your documents accordingly when doing the initial insert can give a nice performance boost.  If your application is upsert heavy, you should consider this.