Optimizing ElasticSearch MongoDB River to avoid stale

  26 Nov 2014


The error I have found recently is:

[2014-11-25 23:52:42,856][WARN ][org.elasticsearch.river.mongodb.Slurper] Exception in slurper
org.elasticsearch.river.mongodb.Slurper$SlurperException: River out of sync with oplog.rs collection
        at org.elasticsearch.river.mongodb.Slurper.isRiverStale(Slurper.java:667)
        at org.elasticsearch.river.mongodb.Slurper.oplogCursor(Slurper.java:652)
        at org.elasticsearch.river.mongodb.Slurper.run(Slurper.java:121)
        at java.lang.Thread.run(Thread.java:745)
[2014-11-25 23:52:42,894][WARN ][org.elasticsearch.river.mongodb.Slurper] Exception in slurper
org.elasticsearch.river.mongodb.Slurper$SlurperException: River out of sync with oplog.rs collection
        at org.elasticsearch.river.mongodb.Slurper.isRiverStale(Slurper.java:667)
        at org.elasticsearch.river.mongodb.Slurper.oplogCursor(Slurper.java:652)
        at org.elasticsearch.river.mongodb.Slurper.run(Slurper.java:121)
        at java.lang.Thread.run(Thread.java:745)
[2014-11-25 23:52:42,897][WARN ][org.elasticsearch.river.mongodb.Slurper] Exception in slurper
org.elasticsearch.river.mongodb.Slurper$SlurperException: River out of sync with oplog.rs collection
        at org.elasticsearch.river.mongodb.Slurper.isRiverStale(Slurper.java:667)
        at org.elasticsearch.river.mongodb.Slurper.oplogCursor(Slurper.java:652)
        at org.elasticsearch.river.mongodb.Slurper.run(Slurper.java:121)
        at java.lang.Thread.run(Thread.java:745)

Elasticsearch river raises this error because it falls too far behind with mongodb oplog.rs collection, after that it goes in stale mode. ( It is the same way as a Mongo replica set member becomes “stale” ) And increasing Mongodb oplog size is one of those suggestions. Also, the author of mongodb River plugin suggested that reindex all data in ES is the way to go. Actually, I’m not impressed with this simple workaround, and I don’t have enough time to look it further. I think, If your index is large, initializing index creation may take long time. Consider to use Elastic Aliases to point and manage the index, perhaps check out http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/index-aliases.html But, this is really just for a personal record:

  1. Delete the index

    curl -X DELETE ‘http://localhost:9200/'

  2. Delete the river

    curl -X DELETE ‘http://localhost:9200/_river/'

  3. Recreate the river.

The meta config (my_river_config.json)

{
  "type": "mongodb",
  "mongodb": {
    "servers": [ { "host": "localhost" }],
    "db": "testDb",
    "collection": "message"
  }, 

  "index": {
    "name": "my_index",
    "type": "message"
    "store": {
        type: "mmapfs"
     }
  }
}

And create the river with:

 curl -X PUT "http://localhost:9200/_river/my_river/_meta" -d @my_river_config.json

Then verify if the river is created successfully with:

 curl -X GET 'localhost:9200/_river/my_river/_status?pretty'

Check how many documents have been indexed:

 curl -X GET 'localhost:9200/my_index/_count?pretty'

Also, the recommended setting for ES heap size is about 50% of the machine memory, so if you have ~64gb, I would recommend giving ES 30gb in order to let JVM compress 64bit pointers. [http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/heap-sizing.html]

And don’t worry about the rest of the memory, it is used by the operating system for the file system cache, i.e. it speeds up disk access. Further more, you can set index.store.type to mmapfs that will tell it to use use memory mapped files for those index files.

curl -X PUT localhost:9200/my_index -d '{
    "settings": {
        "index.store.type": "mmapfs"
    }
}';

I think the better fix will be when Mongodb has Trigger feature or River has better optimization for this issue.

comments powered by Disqus