serverless-plugins icon indicating copy to clipboard operation
serverless-plugins copied to clipboard

ResourceNotFoundException: Invalid ShardId in ShardIterator

Open mshick opened this issue 4 years ago • 7 comments

I've been having issues with serverless-offline-dynamodb-streams for as long as I've been using it (6 months on the current project). After 1 - 2 days I get the following error and the stack hangs.

ResourceNotFoundException: Invalid ShardId in ShardIterator
      at Request.extractError (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/protocol/json.js:52:27)
      at Request.callListeners (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
      at Request.emit (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
      at Request.emit (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/request.js:688:14)
      at Request.transition (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/request.js:22:10)
      at AcceptorStateMachine.runTo (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/state_machine.js:14:12)
      at /Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/state_machine.js:26:10
      at Request.<anonymous> (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/request.js:38:9)
      at Request.<anonymous> (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/request.js:690:12)
      at Request.callListeners (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
      at Request.emit (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
      at Request.emit (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/request.js:688:14)
      at Request.transition (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/request.js:22:10)
      at AcceptorStateMachine.runTo (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/state_machine.js:14:12)
      at /Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/state_machine.js:26:10
      at Request.<anonymous> (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/request.js:38:9)
      at Request.<anonymous> (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/request.js:690:12)
      at Request.callListeners (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
      at callNextListener (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/sequential_executor.js:96:12)
      at IncomingMessage.onEnd (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/[email protected]/node_modules/aws-sdk/lib/event_listeners.js:313:13)
      at IncomingMessage.emit (events.js:327:22)
      at IncomingMessage.EventEmitter.emit (domain.js:486:12)
      at endReadableNT (_stream_readable.js:1327:12)
      at processTicksAndRejections (internal/process/task_queues.js:80:21)

It sounds very similar to #43 . I reset the DB, and get a working serverless offline stack again. For awhile we were on out-of-date serverless packages, and I saw that PR, so I set about getting everything current, and now have all plugins up-to-date, but still encounter the problem.

I started debugging the issue today, logging in various places, looking for something I could key in on to catch these invalid ShardIds, but found nothing.

I attempted to short circuit the error emitted from read with the code below, and it seemed to get everything working again:

...
    function gotRecords(err, data) {
      // if (err) return checkpoint.emit('error', err);
      if (err) return null
      setTimeout(readable.push.bind(readable), options.readInterval || 500, data.Records);
    }
...

I've tested reading and writing from my local dynamo without issues.

My question then is, why might this be working, and what is the appropriate fix? I can help debug and test, though I am a bit out of my league on the actual solution.

mshick avatar Dec 03 '20 00:12 mshick

Pasting some relevant sections of my serverless.yml here, in case they offer any clues...

custom:
  streams:
    roles: ${self:custom.resources.dynamoTables.roles.LatestStreamArn, 'arn:aws:dynamodb:ddblocal:000000000000:table/${self:custom.projectName}.dev.roles/stream/2019-08-14T18:57:07.218'}
  serverless-offline:
    noPrependStageInUrl: true
    useWorkerThreads: true
    allowCache: true
  serverless-offline-dynamodb-streams:
    endpoint: http://0.0.0.0:8000
    region: us-east-1
    accessKeyId: root
    secretAccessKey: root
    skipCacheInvalidation: true
    readInterval: 500

plugins:
  - serverless-domain-manager
  - serverless-plugin-warmup
  - serverless-api-compression
  - serverless-webpack
  - serverless-offline-dynamodb-streams
  - serverless-offline-sns
  - serverless-offline
  - serverless-plugin-split-stacks
  - serverless-sentry

functions:
  introspectionCache:
    handler: src/functions/introspection-cache/handler.handler
    timeout: 60
    events:
      - stream:
          type: dynamodb
          batchSize: 10
          arn: ${self:custom.resources.dynamoTables.schema.LatestStreamArn}
          startingPosition: LATEST

mshick avatar Dec 03 '20 15:12 mshick

facing the same issue

mdrijwan avatar Jan 14 '21 16:01 mdrijwan

I was having this issue using the official DynamoDB Local docker container. The shard error seems to be related to it (I encountered the same error trying to use streams outside of serverless-offline entirely).

I switched to LocalStack and streams are working now. For anyone using docker-compose, this is my config

services:
  localstack:
    image: localstack/localstack
    ports:
      - '4566:4566'
      - '4571:4571'
      - '8000:4566' # optional - exposes edge port on 8000 as well since it's the common dynamodb port
    environment:
      - SERVICES=s3,sns,sqs,apigateway,lambda,dynamodb,dynamodbstreams,cloudformation
    volumes:
      - '${TMP_DIR:-/tmp/localstack}:${TMP_DIR:-/tmp/localstack}'
      - '/var/run/docker.sock:/var/run/docker.sock'

mattjennings avatar Jan 25 '21 22:01 mattjennings

Same issue here, but we are not using docker to start the dynamodb, we just follow advice from the docs at the moment and use serverless dynamodb start --migrate command to start it, looks like we will have to start it in a docker with localstack image instead.

dmitriy-baltak avatar May 05 '21 09:05 dmitriy-baltak

I had the same issue and restarting mac os helped 😂

ondrejrohon avatar Jan 10 '22 08:01 ondrejrohon

Running into the same issue. unable to resolve currently.

// Edit: What seemed to help in my case was to remove the docker volume (for the data, not the image) for opensearch/elasticsearch and recreate it.

martinjuhasz avatar Dec 09 '22 09:12 martinjuhasz

Does anyone have idea?

voccer-pionero avatar Jul 28 '23 06:07 voccer-pionero