fedimint icon indicating copy to clipboard operation
fedimint copied to clipboard

Intermittant CI failure tracking issue

Open justinmoon opened this issue 3 years ago • 17 comments

We've had a handful of CI failure that seem to disappear when you re-run them. Making an issue to keep track of them.

  • lightning_gateway_pays_invoice lightning.amount_sent() assertion https://github.com/fedimint/minimint/runs/7213579990?check_suite_focus=true#step:6:4320

justinmoon avatar Jul 06 '22 13:07 justinmoon

Basically this occurs when the listfunds command doesn't return the updated balances right away. It started when we added lightningd --dev-fast-gossip.

I added a sleep and FIXME to that line of code. I believe this will make the CI reliable, but hopefully we can create a better long-term fix.

I was debating removing this assertion, maybe there is a better way of verifying that actual LN funds have moved.

jkitman avatar Jul 06 '22 14:07 jkitman

Maybe just wait/poll till it has the expected value and timeout otherwise?

elsirion avatar Jul 07 '22 07:07 elsirion

Here's another one https://github.com/fedimint/minimint/runs/7290083028?check_suite_focus=true#step:7:2246

justinmoon avatar Jul 11 '22 20:07 justinmoon

Here's another one https://github.com/fedimint/minimint/runs/7290083028?check_suite_focus=true#step:7:2246

wanted to post the same in here, I get this a lot on my local setup especially on this branch

NicolaLS avatar Jul 12 '22 08:07 NicolaLS

https://github.com/fedimint/minimint/runs/7360992558?check_suite_focus=true

elsirion avatar Jul 15 '22 16:07 elsirion

https://github.com/fedimint/fedimint/runs/7881383484?check_suite_focus=true

NicolaLS avatar Aug 17 '22 15:08 NicolaLS

https://github.com/fedimint/fedimint/runs/7881960910?check_suite_focus=true

NicolaLS avatar Aug 17 '22 15:08 NicolaLS

https://github.com/fedimint/fedimint/runs/7881960910?check_suite_focus=true

Port 8080 already bound, again https://github.com/fedimint/fedimint/runs/7881960910?check_suite_focus=true#step:5:6101

justinmoon avatar Aug 18 '22 14:08 justinmoon

https://github.com/fedimint/fedimint/actions/runs/3049065296/jobs/4914765743

justinmoon avatar Sep 14 '22 00:09 justinmoon

@justinmoon Did using port picker resolve these issues?

jkitman avatar Sep 14 '22 12:09 jkitman

I don't know because it only happens very infrequently. Do you want to just add it? https://github.com/fedimint/fedimint/pull/427

justinmoon avatar Sep 14 '22 16:09 justinmoon

@jkitman could this maybe be the same issue I had ? (which got fixed by running the tests with --release) I also got the port already in use error (for somereason?) when that happened

NicolaLS avatar Sep 16 '22 01:09 NicolaLS

@NicolaLS I don't think so because we removed the timeouts which was the issue you were hitting. Still, shouldn't hurt to add port-picker to the integration tests as well...

jkitman avatar Sep 16 '22 12:09 jkitman

https://github.com/fedimint/fedimint/actions/runs/3112194306/jobs/5045353221#step:11:6632

elsirion avatar Sep 23 '22 19:09 elsirion

https://github.com/fedimint/fedimint/actions/runs/3112194306/jobs/5045353221#step:11:6632

Exact same error here https://github.com/fedimint/fedimint/actions/runs/3123610337/jobs/5066377856#step:11:12449

In this case re-running CI fixed it. I'm inclined to believe this is just core-lightning being flaky.

justinmoon avatar Sep 25 '22 23:09 justinmoon

Asked about this in core-lightning discord https://discord.com/channels/899980449231814676/899989729183940629/1023743433543782470

Rusty's response:

Yes, you are mining too fast, and we're freaking out. Generally this means your tests need to make sure everything is fully settled (listpeers channels has an empty htlcs array) before mining more. Or, make sure nodes have digested current blocks before making payment (see getinfo blockheight). Finally, note the dev-bitcoin-poll option for developer builds, which ll can reduce the 60 second polling interval

justinmoon avatar Sep 25 '22 23:09 justinmoon

https://github.com/fedimint/fedimint/actions/runs/3165116609/jobs/5153897116

justinmoon avatar Oct 01 '22 17:10 justinmoon

Client loses money. https://github.com/fedimint/fedimint/actions/runs/3543886489/jobs/5950840882

elsirion avatar Nov 24 '22 22:11 elsirion

Client loses money. https://github.com/fedimint/fedimint/actions/runs/3543886489/jobs/5950840882

@elsirion Any insight into how this happens or whether it's a real bug?

jkitman avatar Nov 26 '22 16:11 jkitman

Client loses money. https://github.com/fedimint/fedimint/actions/runs/3543886489/jobs/5950840882

@elsirion Any insight into how this happens or whether it's a real bug?

The spend_ecash function seems faulty, let mut tx = TransactionBuilder::default(); should be inside the loop imo so we don't issue the same e-cash token twice if the DB tx fails for some reason (why it would, idk):

https://github.com/fedimint/fedimint/blob/13a0e24e902f63fae682c4e4af6e02069621086a/client/client-lib/src/lib.rs#L460-L499

EDIT: on a second thought, the tx should fail because of the input side double spend in that case …

elsirion avatar Nov 26 '22 23:11 elsirion

Hitting a non-deterministic issue with lightning_gateway_pays_internal_invoice

https://github.com/fedimint/fedimint/actions/runs/3808388883/jobs/6478925802

Looks like something is timing out

last 10 log lines: > ---- lightning_gateway_pays_internal_invoice stdout ---- > thread 'lightning_gateway_pays_internal_invoice' panicked at 'called Result::unwrap() on an Err value: ClientError(MintApiError(Timeout))', integrationtests/tests/tests.rs:456:14 > > > failures: > lightning_gateway_pays_internal_invoice > > test result: FAILED. 27 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 42.64s

m1sterc001guy avatar Dec 30 '22 19:12 m1sterc001guy

Just saw ^^ in CI https://github.com/fedimint/fedimint/actions/runs/3870489521/jobs/6597452580

justinmoon avatar Jan 09 '23 12:01 justinmoon

lightning_gateway_pays_internal_invoice just hung and caused timeout https://github.com/fedimint/fedimint/actions/runs/3934580638/jobs/6729458243

justinmoon avatar Jan 17 '23 02:01 justinmoon

thread 'lightning_gateway_claims_refund_for_internal_invoice' panicked at 'called Result::unwrap() on an Err value: NoGateways' https://github.com/fedimint/fedimint/actions/runs/3941469567/jobs/6743919007

maan2003 avatar Jan 17 '23 17:01 maan2003

lightning_gateway_pays_internal_invoice just hung and caused timeout https://github.com/fedimint/fedimint/actions/runs/3934580638/jobs/6729458243

this is very common, happened to me twice today.

maan2003 avatar Jan 17 '23 18:01 maan2003

https://github.com/fedimint/fedimint/actions/runs/3953265674/attempts/1

elsirion avatar Jan 18 '23 22:01 elsirion

https://github.com/fedimint/fedimint/actions/runs/3998730908/jobs/6861744016

maan2003 avatar Jan 24 '23 18:01 maan2003

https://github.com/fedimint/fedimint/actions/runs/4087609870/jobs/7048326190#step:5:7789

elsirion avatar Feb 03 '23 20:02 elsirion

https://github.com/fedimint/fedimint/actions/runs/4165714395/jobs/7209097454

justinmoon avatar Feb 13 '23 17:02 justinmoon

https://github.com/fedimint/fedimint/actions/runs/4304534238/jobs/7505698626

maan2003 avatar Mar 01 '23 14:03 maan2003