Watch out for Cloudant replication errors

I have been able to use Cloudant NoSQL databases for free (within $50 limit) since early 2015. Checking the usage every month or so seemed sufficient. What a surprise when I saw $467 usage in one of the environments in the middle of this month!

The first thing I checked was when the usage started to grow. I saw a spike starting on 14 October last month:

usage_stage_lastmonth

Then I checked my other environments and the same date marked a spike of GET requests in the production environment:

prod_last_month_light

I have no continuous replications set up, but this clearly showed that some replication is going on. The type of the requests (GET from the production system and GET/POST/PUT in the backup system) also pointed to a replication issue.

My usual replications are set to run once a week, the continuous excessive usage ruled them out. I was now looking for any obvious signs, but no database seemed to contain any excessive number of records, and it does not help that there is no information in Cloudant about which database the hits relate to.

I checked all replication records, verified that none was running, and that none was set up wrongly. One useful information is the _rev (revision) number in the replicator that tells you how many times the given replication ran:

"_id":"crs_backup_01","_rev":"27-fe6545d6737643a39aee38e3cf66d4f3"

"_id":"crs_backup_02","_rev":"27-500dc4881e3b491e66e53d745f792ea9"

"_id":"crs_backup_03","_rev":"27-38e430e92d8b14ad77f5040b645e9853"

"_id":"crs_backup_04","_rev":"27-4a7f68c1541b534a195ccaec08e73b94"

..means these replication ran 27 times, each about every month.

Nothing unusual, I was puzzled. What happened on 14 October? I made a list of all activities I did, collected details of applications that are using the database, no luck.

Then I checked active replications again and to my surprise I found a ghost replication running against one of the databases. And in a few minutes another!

I quickly checked the replicator database and now the replications showed up:


"_id":"audittrail_backup_03","_rev":"773460-2d75dc285fb16c240a752659f474ebd4"

"_id":"audittrail_backup_04","_rev":"578107-5dad673feec98019dd36c4f21033d545"

773,460 replication runs??? There was the culprit. This is about one replication per hour since 14 October.

I canceled both replications and they did not come up since. But where did they come from? They were not visible inside the replicator database before and I can’t see any reason why they would be created a month ago.

Watch out for Cloudant errors. No system is perfect.