daml icon indicating copy to clipboard operation
daml copied to clipboard

Daml-script communication needs to be more robust

Open samuel-williams-da opened this issue 1 year ago • 2 comments

From Curtis, a description of the problem:

  • I was just on a call with GS. They are leery about using Daml Script. One reason is they encountered the following situation:
  • Running their Daml / Canton system in the cloud
  • Were using Daml Script and, I assume, running it locally or on a server in the cloud
  • The connection from the host running Daml Script to the PN broke
  • This connection breakage made an unrecoverable situation
  • A network (socket) connection breaking is a very normal thing. Does Daml Script have any retry mechanism for this? Are there any other gaps in making that connection robust?
  • Another related situation was a JWT token expiring in the middle of a Script execution. What would happen in this case?

Netty retry findings

It appears that daml-script's underlying Netty Channel when using GRPC doesn't use any retry logic. The channel builder LedgerClientChannelConfiguration.builderFor sets up TLS and message sizes, but doesn't setup:

.enableRetry()
.maxRetryAttempts(10)

for example. There is also information here on retry directly in GRPC. I've seen a sentiment online that maxRetryAttempts isn't enough.

samuel-williams-da avatar Aug 19 '24 12:08 samuel-williams-da

Will be discussed in grilling, will move then

dylant-da avatar Aug 20 '24 12:08 dylant-da

Discussed with Curtis at last grilling, explained that making this robust against failure is very difficult, Curtis said he'd get back to us

Putting on backlog until then

dylant-da avatar Sep 17 '24 13:09 dylant-da