resolve icon indicating copy to clipboard operation
resolve copied to clipboard

Improve logging for read model errors

Open oliversturm opened this issue 4 years ago • 8 comments

I found the following behavior the other day:

  • I create a command/projection implementation and a basic read model.

  • I run the application and start testing by sending commands from the command line.

  • My commands work and I receive the events in return. However, the running application complains: SQLITE_ERROR: malformed JSON - something wrong with the queries I wrote for the read models

  • I run my test API call a few more times, trying to track down what exactly caused the issue. Weird - suddenly there are no more error messages from the read model.

  • What happened behind the scenes is that the read model stopped processing after the error it encountered. But this is not logged anywhere!

  • When I eventually stop and restart the app, the same errors come up again - obviously - since the events are processed once more.

This behaviour is somewhat unfortunate for development - stopping and restarting the app is not very convenient for debugging - but the biggest issue is that a new user would have no idea what's happening since the read model does not even log the fact that it has given up.

oliversturm avatar Oct 02 '20 09:10 oliversturm

I had a similar problem again today and I saw output - didn't see this previously, either I missed it or something has changed. Here's what I got:

Event listener "products" can't resume subscription since event bus
      cannot initiate notification for it because of error "Error: Event subscriber products is in error state"
    
Async multiplexed function bound resumeSubscriber failed with error: Notification for event subscriber products cannot be pushed
Error: Notification for event subscriber products cannot be pushed

I just wanted to add this info to the issue for completeness - I don't think that this does very much to solve the problem, since the message is pretty cryptic even if you know what it means.

oliversturm avatar Oct 06 '20 14:10 oliversturm

Hi @oliversturm ! The actual behavior described is unexpected and such errors should not occur.

However, there are several things to consider. First, although all of the included reader adapters have the same compatible interface, the adapters may behave differently. This is subject to improvement and is planned to be done. Second, the read model adapters and the local broker may have incompatible changes between major versions (the middle digit in the version, since the real major is zero). The data directory should be cleared, except for the event store.

The error message provided can't resume subscription since event bus indicates that the read model has crashed for internal reasons. Read-model has crash-once semantics. If any error had been occurred while processing projection, it will bypass any procession until manual reset.

Also, you can always use admin module https://github.com/reimagined/resolve/tree/master/packages/modules/resolve-module-admin to see real error originated from read-model. Maybe it's just malformed code in projection.

IhostVlad avatar Oct 09 '20 11:10 IhostVlad

Read-model has crash-once semantics. If any error had been occurred while processing projection, it will bypass any procession until manual reset.

I understand that, but I didn't think that the messages I saw made this at all clear to a dev who faces this problem for the first time. For instance, the message could clearly say "Please restart your application to resume event processing, or use resolve-module-admin to analyze read model state."

I'll give this another test - the issues I had in my read models were all just my own fault, due to some mistake I made while implementing the projections.

oliversturm avatar Oct 09 '20 11:10 oliversturm

Here's a precise walkthrough that shows the issue I was referring to originally.

  1. Create a new project: npm create-resolve-app read-model-error-test

  2. Add this code to common/aggregates/aggregate-name.commands.js:

export default {
  create: (agg, { payload: { one, two } }) => ({
    type: 'CREATED',
    payload: { one, two },
  }),
};
  1. Add this code to common/read-models/read-model-name.projection.js:
export default {
  Init: (store) =>
    store.defineTable('things', {
      indexes: { id: 'string' },
      fields: ['one', 'two'],
    }),

  CREATED: (store, { aggregateId, payload: { one, two } }) =>
    store.insert('things', { id: aggregateId, one, two }),
};
  1. Run the application

  2. Post a command to the application. Note that the payload only includes the field one, but not two - this will result in an error because of the undefined value.

jsonic "aggregateName:aggregate-name,type:create, aggregateId:'1', payload: { one:42 }" | http POST http://127.0.0.1:3000/api/commands
  1. Now the running application shows these lines of output - there is nothing else.
  resolve:resolve-query:readModel:read-model-name:[CREATED] SQLITE_ERROR: malformed JSON +0ms
  resolve:resolve-query:readModel:wrapConnection applying "CREATED" event to read-model "read-model-name" failed +0ms
  resolve:resolve-query:readModel:wrapConnection SQLITE_ERROR: malformed JSON +0ms
  resolve:resolve-query:sendEvents:read-model-name SQLITE_ERROR: malformed JSON +0ms
  1. Try sending another command - absolutely nothing happens.

  2. If you restart the application at this point, the same error lines come up again - once more, no other info.

  3. With the application running, try to fix the problem with this code:

...
  CREATED: (store, { aggregateId, payload: { one, two } }) =>
    store.insert('things', { id: aggregateId, one, two: two || '' }),
  1. Now you see this output from the application:
      Event listener "read-model-name" can't resume subscription since event bus
      cannot initiate notification for it because of error "Error: Event subscriber read-model-name is in error state"
    
Async multiplexed function bound resumeSubscriber failed with error: Notification for event subscriber read-model-name cannot be pushed
Error: Notification for event subscriber read-model-name cannot be pushed
    at Object.resumeSubscriber (/home/user/temp/resolve-tests/read-model-error-test/node_modules/resolve-local-event-broker/lib/publisher-server/core/resume-subscriber.js:70:11)
    at async /home/user/temp/resolve-tests/read-model-error-test/node_modules/resolve-local-event-broker/lib/multiplex-async.js:18:7

Above you say that the error indicates a crash "for internal reasons" - that's not the case, in fact the read model has just been fixed! I don't really understand what state we are in at this point... if the read model is suspended due to its "crash-once semantics", then shouldn't I see a message to this effect? The error indicates that there is an attempt to resume the read model - this would mean there are no "crash-once semantics".

oliversturm avatar Oct 09 '20 12:10 oliversturm

Yes, you are correct that the current behavior is not particularly obvious, but it is the same in local and cloud applications. The main way to inspect the state of read models and sagas is the corresponding API, which, when launched locally, is provided through the resolve-module-admin module, and in the cloud through the CLI based on resolve-cloud npm package. In fact, to restore the read models building, even restarting the application is not necessary, since after updating the projection code, the fresh projection implementation will be loaded into the application, and the read model can be reset via the appropriate API within reset command.

But the error originated from the read-model is really very unrepresentative: SQLITE_ERROR: malformed JSON. Unifying errors between different read-model adapters and emitting meaningful messages is important feature. This was technically impossible until introducing new event-bus since 0.24.0 version, and now can be implemented as regular feature.

IhostVlad avatar Oct 09 '20 12:10 IhostVlad

provided through the resolve-module-admin module

I didn't even know that thing existed - there's one single mention of it in the troubleshooting doc right now.

Anyway, if that's the mechanism of choice then we can focus on that. The title of this issue says "improve logging" - surely we should display a message that indicates the "stopped/crashed" read model state to the developer and hints at a solution.

And now... I tested your suggestions. Continuing from the description above (with the app still running after the last step), I executed this and got an error:

➜ npx resolve-module-admin read-models list   
npx: installed 47 in 5.121s
invalid json response body at http://localhost:3000/api/event-broker/read-models-list reason: Unexpected token A in JSON at position 0

At the same time, this appears on the console with the running app:

  resolve:resolve-runtime:fail-handler Access error: GET "/api/event-broker/read-models-list" is not addressable by current executor +0ms

The reset command doesn't work either. I suspect that this whole thing doesn't work at all unless I add an API handler for it? Unfortunately I don't even see here how that is supposed to work.

Lastly, you guys must have really hated ttr if you set out to do the same thing yourself!

oliversturm avatar Oct 09 '20 12:10 oliversturm

I tested your suggestions. Continuing from the description above (with the app still running after the last step), I executed this and got an error:

Have you plugged admin-module to your application proper way? Suggestions above are valid since admin-module had been already plugged into application before it have been launched. Module usage is provided here https://github.com/reimagined/resolve/blob/master/examples/hacker-news/run.js#L70

Lastly, you guys must have really hated ttr if you set out to do the same thing yourself!

Module resolve-module-admin had been developed with inspiration of TTR. But every major resolve release introduces breaking changes, so TTR had been stopped work, because of massive internal API usages

IhostVlad avatar Oct 09 '20 12:10 IhostVlad

Right. I found out how to use the module and added it, and now I managed to reset the read model. I did notice that again there is no log output - I would have expected something like "read model reset, resuming".

But every major resolve release introduces breaking changes, so TTR had been stopped work, because of massive internal API usages

Indeed. I intend to look into that - I checked out a part of it and found that some issues were due to pretty minor changes like the renaming of the jwtToken in the templates. But there was some use of internal APIs - perhaps I should make a list while I fix it, so we can see which APIs should be added in a less "internal" way.

oliversturm avatar Oct 09 '20 13:10 oliversturm