us icon indicating copy to clipboard operation
us copied to clipboard

Tooling to retry failed contract renewals

Open MeijeSibbel opened this issue 3 years ago • 5 comments

Quote: "Attempt renewal again if the transaction ends up being reorged. This doesn't strike me as too difficult, but currently I don't think there's any us tooling for it."

This tooling will help resolve no record of that host after renewal.

image

MeijeSibbel avatar Oct 07 '20 19:10 MeijeSibbel

On second thought, I'm not sure any special tooling is needed here; the old contract is not deleted from the muse server, so if you try to use a renewed contract and it fails, you should be able to simply attempt the renewal again (by calling the /renew endpoint with the same arguments).

Another thing you could try is to always attempt a second renewal 12 blocks after the first one (or 2 hours, if that's easier). If the first renewal succeeded and has not been reorged, you'll get a predictable error, e.g. "contract can not be revised." Otherwise, you'll get a different error, or nil, in which case you know that you should try again in another 2 hours.

lukechampine avatar Oct 13 '20 20:10 lukechampine

If I'm not mistaken, the old contract is already finalized, and we cannot renew it anymore, can we? The problem is the host accepts renewing a contract but later it removes the contract. Does the host keep the old contract and use it after removing the renewed contract?

jkawamoto avatar Oct 15 '20 10:10 jkawamoto

Hmm. I'll have to look at the host code. This sounds like something that the host should handle (either by restoring the old contract, or by automatically resubmitting the new contract) but may not be.

I'd suggest at least attempting to renew again when this happens, just to see what sort of error you get. That could be helpful.

lukechampine avatar Oct 27 '20 20:10 lukechampine

Guys, @jkawamoto @lukechampine what is the status on this issue? Looking at Kibana i can still se endless messages with:

too many hosts did not supply their shard (needed 5, got 3): 
c31b05d3: no record of that host
1c96a10c: no record of that host
713bee98: no record of that host
2cb71a1e: no record of that host
d9cd1249: no record of that host
bd572d72: no record of that host
f8642258: NewUnlockedSession: connect: no route to host
d0ced087: no record of that host
c32db729: no record of that host
3c6952cf: no record of that host
c204cc3e: no record of that host
0dbd913b: NewUnlockedSession: connect: connection refused
a85816e6: no record of that host
d441be7b: Settings: couldn't read LoopSettings response: read tcp 10.244.0.86:54556->50.35.89.213:9982: i/o timeout
9d43f278: no record of that host
c99c8227: no record of that host
4b804458: NewUnlockedSession: connect: connection timed out
too many hosts did not supply their shard (needed 5, got 2): 
713bee98: no record of that host
a40bdf4a: no record of that host
1c96a10c: no record of that host
6f79ed6c: no record of that host
c32db729: no record of that host
9ba38c50: no record of that host
bd572d72: no record of that host
d9cd1249: no record of that host
57dbc08d: no record of that host
6c202b48: no record of that host
c99c8227: no record of that host
f5fe9ca9: no record of that host
b63fb3df: no record of that host
3c6952cf: no record of that host
d0ced087: no record of that host
f1200eea: NewUnlockedSession: lookup madbri.ddns.net on 10.245.0.10:53: no such host
0dbd913b: NewUnlockedSession: connect: connection refused
d441be7b: Lock: couldn't read LoopLock response: read tcp 10.244.0.86:53906->50.35.89.213:9982: i/o timeout

We're basically still permanently losing data because of this.

MeijeSibbel avatar Mar 30 '21 01:03 MeijeSibbel

I'm skeptical that this is being caused by renewals being reorged, but I can't say for sure because I don't know how common reorgs are on mainnet. I'll try to get some stats on that. (UPDATE: According to SiaStats, there have been 10 reorgs in the past ~6000 blocks, so about one reorg per week. All of those reorgs were 1 block deep.)

If reorgs are the cause, then the impact could be minimized by "staggering" your renewals, i.e. renew one contract every 10 minutes instead of renewing all of the contracts together.

We should evaluate other potential causes as well, though.

lukechampine avatar Mar 30 '21 16:03 lukechampine