Intermittant CI failure tracking issue
We've had a handful of CI failure that seem to disappear when you re-run them. Making an issue to keep track of them.
-
lightning_gateway_pays_invoicelightning.amount_sent()assertion https://github.com/fedimint/minimint/runs/7213579990?check_suite_focus=true#step:6:4320
Basically this occurs when the listfunds command doesn't return the updated balances right away. It started when we added lightningd --dev-fast-gossip.
I added a sleep and FIXME to that line of code. I believe this will make the CI reliable, but hopefully we can create a better long-term fix.
I was debating removing this assertion, maybe there is a better way of verifying that actual LN funds have moved.
Maybe just wait/poll till it has the expected value and timeout otherwise?
Here's another one https://github.com/fedimint/minimint/runs/7290083028?check_suite_focus=true#step:7:2246
Here's another one https://github.com/fedimint/minimint/runs/7290083028?check_suite_focus=true#step:7:2246
wanted to post the same in here, I get this a lot on my local setup especially on this branch
https://github.com/fedimint/minimint/runs/7360992558?check_suite_focus=true
https://github.com/fedimint/fedimint/runs/7881383484?check_suite_focus=true
https://github.com/fedimint/fedimint/runs/7881960910?check_suite_focus=true
https://github.com/fedimint/fedimint/runs/7881960910?check_suite_focus=true
Port 8080 already bound, again https://github.com/fedimint/fedimint/runs/7881960910?check_suite_focus=true#step:5:6101
https://github.com/fedimint/fedimint/actions/runs/3049065296/jobs/4914765743
@justinmoon Did using port picker resolve these issues?
I don't know because it only happens very infrequently. Do you want to just add it? https://github.com/fedimint/fedimint/pull/427
@jkitman could this maybe be the same issue I had ? (which got fixed by running the tests with --release) I also got the port already in use error (for somereason?) when that happened
@NicolaLS I don't think so because we removed the timeouts which was the issue you were hitting. Still, shouldn't hurt to add port-picker to the integration tests as well...
https://github.com/fedimint/fedimint/actions/runs/3112194306/jobs/5045353221#step:11:6632
https://github.com/fedimint/fedimint/actions/runs/3112194306/jobs/5045353221#step:11:6632
Exact same error here https://github.com/fedimint/fedimint/actions/runs/3123610337/jobs/5066377856#step:11:12449
In this case re-running CI fixed it. I'm inclined to believe this is just core-lightning being flaky.
Asked about this in core-lightning discord https://discord.com/channels/899980449231814676/899989729183940629/1023743433543782470
Rusty's response:
Yes, you are mining too fast, and we're freaking out. Generally this means your tests need to make sure everything is fully settled (listpeers channels has an empty htlcs array) before mining more. Or, make sure nodes have digested current blocks before making payment (see getinfo blockheight). Finally, note the dev-bitcoin-poll option for developer builds, which ll can reduce the 60 second polling interval
https://github.com/fedimint/fedimint/actions/runs/3165116609/jobs/5153897116
Client loses money. https://github.com/fedimint/fedimint/actions/runs/3543886489/jobs/5950840882
Client loses money. https://github.com/fedimint/fedimint/actions/runs/3543886489/jobs/5950840882
@elsirion Any insight into how this happens or whether it's a real bug?
Client loses money. https://github.com/fedimint/fedimint/actions/runs/3543886489/jobs/5950840882
@elsirion Any insight into how this happens or whether it's a real bug?
The spend_ecash function seems faulty, let mut tx = TransactionBuilder::default(); should be inside the loop imo so we don't issue the same e-cash token twice if the DB tx fails for some reason (why it would, idk):
https://github.com/fedimint/fedimint/blob/13a0e24e902f63fae682c4e4af6e02069621086a/client/client-lib/src/lib.rs#L460-L499
EDIT: on a second thought, the tx should fail because of the input side double spend in that case …
Hitting a non-deterministic issue with lightning_gateway_pays_internal_invoice
https://github.com/fedimint/fedimint/actions/runs/3808388883/jobs/6478925802
Looks like something is timing out
last 10 log lines:
> ---- lightning_gateway_pays_internal_invoice stdout ----
> thread 'lightning_gateway_pays_internal_invoice' panicked at 'called Result::unwrap() on an Err value: ClientError(MintApiError(Timeout))', integrationtests/tests/tests.rs:456:14
>
>
> failures:
> lightning_gateway_pays_internal_invoice
>
> test result: FAILED. 27 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 42.64s
Just saw ^^ in CI https://github.com/fedimint/fedimint/actions/runs/3870489521/jobs/6597452580
lightning_gateway_pays_internal_invoice just hung and caused timeout https://github.com/fedimint/fedimint/actions/runs/3934580638/jobs/6729458243
thread 'lightning_gateway_claims_refund_for_internal_invoice' panicked at 'called Result::unwrap() on an Err value: NoGateways'
https://github.com/fedimint/fedimint/actions/runs/3941469567/jobs/6743919007
lightning_gateway_pays_internal_invoicejust hung and caused timeout https://github.com/fedimint/fedimint/actions/runs/3934580638/jobs/6729458243
this is very common, happened to me twice today.
https://github.com/fedimint/fedimint/actions/runs/3953265674/attempts/1
https://github.com/fedimint/fedimint/actions/runs/3998730908/jobs/6861744016
https://github.com/fedimint/fedimint/actions/runs/4087609870/jobs/7048326190#step:5:7789
https://github.com/fedimint/fedimint/actions/runs/4165714395/jobs/7209097454
https://github.com/fedimint/fedimint/actions/runs/4304534238/jobs/7505698626