Do you have a massive number of records you need to dump into MongoDB in a hurry? Try out the bulk insert capabilities of the C++ driver. Depending on the documents I routinely get an insert rate of 50,000 records/sec, sometimes even up over 100,000/sec.
The trick is to use a vector of BSON objects and call insert once. Here’s a simple example application.
#include "client/dbclient.h"
#include <iostream>
#include <vector>
#include <sys/time.h>
#define RECORDS 5000000
int main(int argc, char **argv) {
std::vector<mongo::BSONObj> bulk_data;
mongo::DBClientConnection mongo;
mongo.connect("localhost");
mongo.dropCollection("insert_test.col1");
struct timeval start;
gettimeofday(&start, NULL);
for (int i=0; i<RECORDS; i++) {
mongo::BSONObj record = BSON (
"_id" << i <<
"mystring" << "hello world" );
bulk_data.push_back(record);
if (i % 10000 == 0) {
mongo.insert("insert_test.col1", bulk_data);
bulk_data.clear();
}
}
struct timeval end;
gettimeofday(&end, NULL);
int now = (end.tv_sec * 1000) + (int)(end.tv_usec/1000);
int elapsed_time = now - ((start.tv_sec * 1000) +
(int)(start.tv_usec/1000));
std::cout << "rate: " << RECORDS/(elapsed_time/1000) <<
"/sec" << std::endl;
return 0;
}
Here’s a sample run:
{nehresma@frodo:/tmp}$ time ./bulk
rate: 138888/sec
real 0m36.306s
user 0m0.380s
sys 0m1.904s
36 seconds to insert 5 million records. Not too shabby.