ioredis icon indicating copy to clipboard operation
ioredis copied to clipboard

Handle "LOADING Redis is loading the dataset in memory"

Open shaharmor opened this issue 8 years ago • 20 comments

Hi,

When a slave is first being connected to a master it needs to load the entire DB, which takes time. Any command that is send to that slave during this time will receive a LOADING Redis is loading the dataset in memory response.

I think we should handle this and retry the command (Maybe even to a different node within the same slot).

@luin thoughts?

shaharmor avatar Aug 15 '16 06:08 shaharmor

Its possible that during a failover to a slave, the old master will sync from the new master and cause this error to be returned, which makes the whole failover mechanism not so failsafe.

shaharmor avatar Aug 15 '16 07:08 shaharmor

ioredis already supports detecting loading in standalong version: https://github.com/luin/ioredis/blob/master/lib/redis.js#L420-L428. Seems we just need to wait for the "ready" event of the new redis node here: status.https://github.com/luin/ioredis/blob/master/lib/cluster/connection_pool.js#L58-L63

luin avatar Aug 15 '16 07:08 luin

@luin something like this?

redis = new Redis(_.defaults({
      retryStrategy: null,
      readOnly: readOnly
    }, node, this.redisOptions, { lazyConnect: true }));

    var _this = this;
    redis._readyCheck(function (err) {
      // TODO: handle error
      _this.nodes.all[node.key] = redis;
      _this.nodes[readOnly ? 'slave' : 'master'][node.key] = redis;

      redis.once('end', function () {
        delete _this.nodes.all[node.key];
        delete _this.nodes.master[node.key];
        delete _this.nodes.slave[node.key];
        _this.emit('-node', redis);
        if (!Object.keys(_this.nodes.all).length) {
          _this.emit('drain');
        }
      });

      _this.emit('+node', redis);

      redis.on('error', function (error) {
        _this.emit('nodeError', error);
      });
    });

Also, how should we handle an error in the _readyCheck function?

shaharmor avatar Sep 08 '16 19:09 shaharmor

Hmm...I just checked the code, and it seems that when a node has not finished loading data from the disk, the commands sent to it will be added to its offline queue instead of sending to the redis immediately.

luin avatar Sep 09 '16 03:09 luin

So that means that this should already be fixed? I've seen this happen in production, so its definitely an issue.

Could it be that it happens only to slaves or something? or when using scaleReads?

shaharmor avatar Sep 09 '16 11:09 shaharmor

Its also possible that it happens if the slave was once connected, but then got restarted for some reason

shaharmor avatar Sep 09 '16 11:09 shaharmor

That's strange. Either the node is a slave or a master doesn't affect the support of offline queue. Are you able to reproduce the issue? Or enable the debug log maybe?

luin avatar Sep 09 '16 18:09 luin

I found this issue when I did following.

  1. Accidentally I ran FLUSHALL on redis-cli, I tried to do ctrl-d.
  2. Without stopping redis-server I copied backed up rdb to dump.rdb and restarted redis-server. I found that the copy did not happen actually.
  3. I stopped redis-server and then copied backed up rdb to dump.rdb and started redis-server. Copy worked.
  4. Started redis-cli
  5. Ran command KEYS * and got error (error) LOADING Redis is loading the dataset in memory

kishorpawar avatar Oct 25 '16 11:10 kishorpawar

@shaharmor So how did you deal with it finally ?

kaidiren avatar Jul 28 '17 08:07 kaidiren

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] avatar Oct 23 '17 17:10 stale[bot]

Hey @luin , I just encountered this issue again, and I think we should see how we can fix it.

shaharmor avatar Feb 13 '18 12:02 shaharmor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] avatar Mar 15 '18 12:03 stale[bot]

Hello,

Any news on this? I got the same error on ioredis v4.0.10

Eywek avatar Apr 09 '19 14:04 Eywek

@Eywek @shaharmor Do you have any more details on how you reproduce this issue?

Is it possible you're connected to a slave that has begun a resync? E.g if the master it was pointing to performed a failover? A redis slave would return -LOADING errors during a resync which might explain how you encounter them without a connection reset.

What happens if you implement a reconnectOnError that returns 2 when a LOADING error is encountered?

alavers avatar Apr 05 '20 18:04 alavers

any update

xiandong79 avatar Apr 15 '20 03:04 xiandong79

^I have a hypothesis that an error handler like this:

    reconnectOnError: function(err) {
      if (err.message.includes("LOADING")) {
        return 2;
      }
    }

might solve this problem and if so should perhaps be made a default ioredis behavior. But I haven't built a repeatable way to reproduce this issue.

alavers avatar Apr 15 '20 13:04 alavers

We were able to reproduce this issue by setting up an AWS ElastiCache cluster with the following config:

  • 3 shards, 1 replica per shard
  • Engine: Clustered Redis
  • Engine Version Compatibility: 3.2.10
  • Auto-failover: enabled

We filled this cluster with about 700 Mb of data. Then we setup an ioredis application which continuously sent redis.get's all with keys that belonged to hash slots of one of our shards.

We deleted the replica node in the chosen shard, no gets failed.

But when we added back a node in this shard, we got multiple

got error during get key theKey93923, error: ReplyError: LOADING Redis is loading the dataset in memory

We used the following config for ioredis:

const Redis = require('ioredis');

const redis = new Redis.Cluster(
  [
    {
      host: 'bart-test.rmoljo.clustercfg.euw1.cache.amazonaws.com',
      port: 6379,
    },
  ],
  {
    enableReadyCheck: true,
    scaleReads: 'slave',
  }
);

Using @alavers 's snippet did indeed solve the issue:

const redis = new Redis.Cluster(
  [
    {
      host: 'bart-test.rmoljo.clustercfg.euw1.cache.amazonaws.com',
      port: 6379,
    },
  ],
  {
    enableReadyCheck: true,
    scaleReads: 'slave',
    redisOptions: {
      reconnectOnError: function(err) {
        if (err.message.includes("LOADING")) {
          console.log('got one of dem loading ones');
          return 2;
        }
      }

  }
);

We see the log message

got one of dem loading ones

and not a single error.

Note that we were only able to reproduce it if we used option scaleReads: 'slave'.

We also tried this exact same scenario with a Redis Cluster on our Dev pc and we were unable to reproduce it that way. Ioredis kept sending requests to the master while this new replica node was LOADING the redis dataset in memory. No idea why the behaviour is different between ElastiCache and a non ElastiCache Redis Cluster.

bartpeeters avatar Sep 24 '21 12:09 bartpeeters

Should we make @alavers error handler:

    reconnectOnError: function(err) {
      if (err.message.includes("LOADING")) {
        return 2;
      }
    }

ioredis default behaviour, as we were able to reproduce this (see comment above).

If yes, we could make a PR for this.

bartpeeters avatar Oct 04 '21 10:10 bartpeeters

Sometime this means that you have too much data in Redis and on redis restart, it's going to load all this data in the cache. This will lead to a huge queue that will block any query. If the data isn't important, you need to delete the saved data on the server and restart Redis once again. FLUSHALL won't work since the queue is huge, you need to delete the data directly.

michel-el-hajj avatar Oct 05 '21 07:10 michel-el-hajj

@shaharmor Mac OS:

rm -rf /usr/local/var/db/redis/*
brew services restart redis
redis-cli -n 4 FLUSHDB

hktalent avatar Dec 19 '21 12:12 hktalent