linera-protocol icon indicating copy to clipboard operation
linera-protocol copied to clipboard

When frontend send bad query (with unexpected parameters) to service, the service seems hang and frontend won't get response anymore

Open kikakkz opened this issue 1 year ago • 10 comments

Description We're porting ResPeer frontend to Linera SDK v0.10.0. Due to the query pattern is modified a lot in v0.10.0, so our old code send old pattern to node service. At that time, when we try to query again, the node service seems hang and we cannot get response anymore.

Result

  • It's OK to access node service and application page, but will stuck for query.
  • Restart service won't recover, we have to deploy new cluster

Note Currently I just notice that happen when we run old frontend code which will send old query pattern, but don't get any other clue, and we don't get any error log.

Screenshot image

Version v0.10.0

kikakkz avatar Mar 31 '24 04:03 kikakkz

I'm not very sure it's caused by error query pattern. I think I already correct all query pattern in frontend, but it still hangs soon or later.

kikakkz avatar Mar 31 '24 07:03 kikakkz

Thanka @kikakkz. We'll definitely have a look (repro steps would help!)

ma2bd avatar Mar 31 '24 15:03 ma2bd

Thanka @kikakkz. We'll definitely have a look (repro steps would help!)

I don't get specific reproduce steps. I'll test a bit more to try.

kikakkz avatar Apr 01 '24 12:04 kikakkz

Thanks for the report! Would you be able to run the node service using RUST_LOG=debug linera service ... and collect a log file to send to us?

jvff avatar Apr 02 '24 17:04 jvff

Thanks for the report! Would you be able to run the node service using RUST_LOG=debug linera service ... and collect a log file to send to us?

When enable debug log, after frontend request to service, the log seems never stop. Some requests success, and the remain part stuck without response.

service_9080.zip

kikakkz avatar Apr 03 '24 00:04 kikakkz

@jvff Are we missing a processing timeout for Wasm service queries?

ma2bd avatar Apr 03 '24 01:04 ma2bd

Thanks for the logs! When looking at them, I saw that there seems to be a loop where a chain is subscribed to itself, and sends a message to itself, then receives the message and sends another, and then repeats. I saw this by running grep BlockProposal service_9080.log, where you can see blocks being added to the chain repeatedly.

I think the issue can be solved by changing the application to not subscribe to itself, or maybe not produce a new message if it receives a message from itself :thinking:

jvff avatar Apr 03 '24 18:04 jvff

Thanks for the logs! When looking at them, I saw that there seems to be a loop where a chain is subscribed to itself, and sends a message to itself, then receives the message and sends another, and then repeats. I saw this by running grep BlockProposal service_9080.log, where you can see blocks being added to the chain repeatedly.

I think the issue can be solved by changing the application to not subscribe to itself, or maybe not produce a new message if it receives a message from itself 🤔

I'll check it, seems that should be the case, 😄. This is not allowed in new SDK, right ? It's work with old SDK (0.6.0). Or it's still a bug of the SDK (at least not stuck or not allow such dead loop I think, 😄) ?

kikakkz avatar Apr 04 '24 05:04 kikakkz

The issue should be reproducible in 0.6.0 as well :thinking:

We considered at least warning if a chain subscribes to itself, but that by itself doesn't cause the problem (you need to subscribe to yourself and produce a message every time you handle a message). And another argument brought up is that you could still write the same bug with multiple chains, you just need to create a cycle, and that would be hard to detect.

jvff avatar Apr 09 '24 01:04 jvff

This issue is reproducible and I think it's caused as @jvff metioned by subscription to self chain with message (It's actually caused by on error when I port ResPeer to SDK v0.10.0)

Here https://github.com/respeer-ai/res-peer/blob/ffb5df6ff62889887fdd3e5a682bea5827f32777/review/src/contract.rs#L1270 what I would like to do is:

  • Subscribe to creation chain of this application, then if any message needs to be broadcast, creation chain will broadcast it
  • If RequestSubscribe message is from creation chain, ignore it

But check with self.require_message_id()?.chain_id != self.runtime.application_id().creation.chain_id is wrong and it should be == here. So when the message is from the same chain and subscribe to itself, the linera log output will never stop.

After correct the check condition then it seems OK.

kikakkz avatar Apr 09 '24 02:04 kikakkz