taproot-assets icon indicating copy to clipboard operation
taproot-assets copied to clipboard

[bug]: Failure to run `TestTaprootAssetsDaemon/multi_signature_on_all_levels`

Open Nsandomeno opened this issue 1 year ago • 12 comments
trafficstars

Background

I'm failing to debug through the TestTaprootAssetsDaemon/multi_signature_on_all_levels but experiencing broad PASS's when debugging all of the TestTaprootAssetsDaemon suite with the same debugging configuration and also with make itest/make itest-only. Despite the PASS's, the issue I depict below does occur on other tests at least, TestTaprootAssetsDaemon/mint_fund_seal_assets.

The logs from all scenarios point out to something wrong with the HTLC Interceptor. I ensure the presence of the routerrpc in the case where I am using VS Code's debugger. Another suspect is a Context deadline being passed but I'm not sure why that would be happening.

The error logs when running with the debug configuration are in the attached file, but align with the snippets below from other itest runs. TaprootAssetsDaemon_multi_signature_on_all_levels_debug_logs.txt

The error logs when running with make itest or make build-itest look like this:

=== NAME  TestTaprootAssetsDaemon
    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0x3fb36fa)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 (0x2a54aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 08:55:37.934 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:37.934 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 08:55:37.934 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:37.934 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:40.300 [ERR] GRDN: Aborting main custodian event loop: receive failed: EOF
2024-06-19 08:55:40.300 [ERR] SRVR: Shutting down because error in main method: received critical error from subsystem: receive failed: EOF
2024-06-19 08:55:40.300 [INF] RPCS: Stopping RPC Server
2024-06-19 08:55:40.300 [INF] SRVR: Shutdown complete

    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0x3fb36fa)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 (0x2a54aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 08:55:40.379 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:40.379 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 08:55:40.379 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 08:55:40.379 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestTaprootAssetsDaemon (287.09s)
    --- PASS: TestTaprootAssetsDaemon/mint_assets (52.10s)
    --- PASS: TestTaprootAssetsDaemon/mint_batch_resume (14.96s)
    --- PASS: TestTaprootAssetsDaemon/asset_meta_validation (14.85s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/opaque_asset (0.17s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/json_asset (0.10s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/invalid_json (0.00s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/custom_meta_type (0.06s)
    --- PASS: TestTaprootAssetsDaemon/asset_name_collision_raises_mint_error (21.72s)
    --- PASS: TestTaprootAssetsDaemon/mint_assets_with_tap_sibling (20.93s)
    --- FAIL: TestTaprootAssetsDaemon/mint_fund_seal_assets (50.83s)
FAIL
FAIL    github.com/lightninglabs/taproot-assets/itest   288.532s
FAIL
make: *** [itest-only] Error 1

The same from make itest-only:

2024-06-19 09:04:22.902 [INF] LNDC: Using network regtest
    mint_fund_seal_test.go:511: 
                Error Trace:    /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/mint_fund_seal_test.go:511
                                                        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/mint_fund_seal_test.go:89
                                                        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/test_harness.go:159
                                                        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/integration_test.go:94
                Error:          Received unexpected error:
                                unable to derive key ring: rpc error: code = DeadlineExceeded desc = context deadline exceeded
                Test:           TestTaprootAssetsDaemon/mint_fund_seal_assets
2024-06-19 09:04:52.904 [INF] SRVR: Stopping Main Server
2024-06-19 09:04:52.904 [INF] RPCS: Stopping RPC Server
2024-06-19 09:04:52.904 [INF] GRDN: Stopping ChainPlanter
2024-06-19 09:04:52.904 [INF] GRDN: Stopping re-org watcher
2024-06-19 09:04:52.904 [INF] RFQS: Stopping RFQ system
2024-06-19 09:04:52.904 [INF] RFQS: Stopping subsystem: order handler
2024-06-19 09:04:52.904 [INF] RFQS: Stopping subsystem: stream handler
2024-06-19 09:04:52.904 [INF] RFQS: Stopping subsystem: quote negotiator
2024-06-19 09:04:52.904 [INF] TCHN: Stopping aux leaf creator
2024-06-19 09:04:52.904 [INF] TCHN: Stopping aux leaf signer
2024-06-19 09:04:52.904 [INF] TCHN: Stopping aux traffic shaper
2024-06-19 09:04:52.904 [INF] TCHN: Stopping aux invoice manager
2024-06-19 09:04:52.904 [ERR] TCHN: Error setting up invoice acceptor: context canceled
2024-06-19 09:04:54.362 [ERR] GRDN: Aborting main custodian event loop: receive failed: EOF
2024-06-19 09:04:54.362 [ERR] SRVR: Shutting down because error in main method: received critical error from subsystem: receive failed: EOF
2024-06-19 09:04:54.362 [INF] RPCS: Stopping RPC Server
2024-06-19 09:04:54.362 [INF] SRVR: Shutdown complete

=== NAME  TestTaprootAssetsDaemon
    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0xa6c16fa)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 (0x9162aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 09:04:54.425 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:54.425 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 09:04:54.425 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:54.426 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:56.502 [ERR] GRDN: Aborting main custodian event loop: receive failed: EOF
2024-06-19 09:04:56.502 [ERR] SRVR: Shutting down because error in main method: received critical error from subsystem: receive failed: EOF
2024-06-19 09:04:56.502 [INF] RPCS: Stopping RPC Server
2024-06-19 09:04:56.503 [INF] SRVR: Shutdown complete

    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0xa6c16fa)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 (0x9162aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 09:04:56.602 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:56.602 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:04:56.602 [WRN] RFQS: Error received from stream handler wire message channel: <nil>
2024-06-19 09:04:56.602 [WRN] RFQS: Error received from stream handler wire message channel: <nil>
2024-06-19 09:04:56.602 [WRN] RFQS: Error received from stream handler wire message channel: <nil>
2024-06-19 09:04:56.602 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 09:04:56.602 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestTaprootAssetsDaemon (311.43s)
    --- PASS: TestTaprootAssetsDaemon/mint_assets (43.24s)
    --- PASS: TestTaprootAssetsDaemon/mint_batch_resume (17.79s)
    --- PASS: TestTaprootAssetsDaemon/asset_meta_validation (13.49s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/opaque_asset (0.44s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/json_asset (0.26s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/invalid_json (0.00s)
        --- PASS: TestTaprootAssetsDaemon/asset_meta_validation/custom_meta_type (0.23s)
    --- PASS: TestTaprootAssetsDaemon/asset_name_collision_raises_mint_error (22.49s)
    --- PASS: TestTaprootAssetsDaemon/mint_assets_with_tap_sibling (28.21s)
    --- FAIL: TestTaprootAssetsDaemon/mint_fund_seal_assets (49.01s)
FAIL
FAIL    github.com/lightninglabs/taproot-assets/itest   312.936s
FAIL
make: *** [itest-only] Error 1

And lastly, the error logs when running with make itest-only and commenting out all other test cases in (test_list_on_test.go):

=== NAME  TestTaprootAssetsDaemon
    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0x5e6685a)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 (0x4915aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 09:17:39.695 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:39.695 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 09:17:39.695 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:39.695 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:41.721 [ERR] GRDN: Aborting main custodian event loop: receive failed: EOF
2024-06-19 09:17:41.721 [ERR] SRVR: Shutting down because error in main method: received critical error from subsystem: receive failed: EOF
2024-06-19 09:17:41.721 [INF] RPCS: Stopping RPC Server
2024-06-19 09:17:41.722 [INF] SRVR: Shutdown complete

    test_harness.go:139: Error outside of test: *errors.errorString Error running server: received critical error from subsystem: receive failed: EOF
        /Users/nicholassandomeno/lightning/dev/taproot-assets/itest/tapd_harness.go:322 (0x5e6685a)
                (*tapdHarness).start.func2: hs.ht.Fatalf("Error running server: %v", err)
        /Users/nicholassandomeno/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1695 (0x4915aa1)
                goexit: BYTE    $0x90   // NOP
2024-06-19 09:17:41.897 [ERR] TCHN: Error setting up invoice acceptor: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:41.897 [WRN] RFQS: Error received from stream handler wire message channel: receive failed: rpc error: code = Unavailable desc = error reading from server: EOF
2024-06-19 09:17:41.897 [WRN] RFQS: Raw peer messages channel closed unexpectedly
2024-06-19 09:17:41.897 [ERR] RFQS: Error setting up HTLC interception: unable to setup incoming HTLC interceptor: rpc error: code = Unavailable desc = error reading from server: EOF
--- FAIL: TestTaprootAssetsDaemon (181.91s)
    --- FAIL: TestTaprootAssetsDaemon/multi_signature_on_all_levels (29.68s)

The debugging config is specific to VS Code, however, it intends to capture the flags that the Makefile and make/testing_flags.mk:

{
    "configurations": [
    {
        "name": "Launch test function",
        "type": "go",
        "request": "launch",
        "mode": "test",
        "env": {
            "CGO_ENABLED": "0",
        },
        "buildFlags": [
            "-v",
            "-tags=dev,monitoring,autopilotrpc,chainrpc,invoicesrpc,peersrpc,routerrpc,signrpc,verrpc,walletrpc,watchtowerrpc,wtclientrpc,tapdevrpc,mintrpc,assetwalletrpc,taprpc,universerpc,integration,itest,btcd"
        ],
        "program": "${workspaceFolder}/itest",
        "args": [
            "-btcdexec=./btcd-itest",
            "-logdir=regtest",
            "-test.run=TestTaprootAssetsDaemon/multi_signature_on_all_levels",
            "-test.timeout=20m",
            "-logoutput",
            "-logdir=regtest",
            "-optional=false"
        ]
    }
    ]
}

Environment

The results above are from the latest commit of main Please see debug config above as well.

nicholassandomeno@Nicholass-MacBook-Pro taproot-assets % git pull
Already up to date.
nicholassandomeno@Nicholass-MacBook-Pro taproot-assets % date
Wed Jun 19 08:41:23 EDT 2024
nicholassandomeno@Nicholass-MacBook-Pro taproot-assets % git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        itest/__debug_bin3991762036

nothing added to commit but untracked files present (use "git add" to track)
nicholassandomeno@Nicholass-MacBook-Pro taproot-assets % 

Steps to reproduce

  • make install
  • make build itest
  • Setup the config in VS Code or proceed with the make steps only
  • Start the VS Code debugger
  • Alternatively, run make itest and examine output for TestTaprootAssetsDaemon/
  • Alternatively, run make itest-only and examine output for TestTaprootAssetDaemon/

Expected behavior

Passing test.

Actual behavior

Failing test.

Nsandomeno avatar Jun 19 '24 13:06 Nsandomeno

I do see a similar error in the logs for the run itests CI test on PR #933 here

Nsandomeno avatar Jun 19 '24 14:06 Nsandomeno

The actual error you're getting seems to be: unable to derive key ring: rpc error: code = DeadlineExceeded desc = context deadline exceeded, all the other (EOF) related ones are follow-up errors. So it sounds like your lnd backend node isn't responding in time. Can you upload all the logs of a failed run please? They should be in itest/regtest.

guggero avatar Jun 20 '24 09:06 guggero

This includes 0--Alice, 1--Bob, 2--uni-server-lnd, output_btcd_chainbackend, and output_btcd_miner - is that everything you would be looking for? regtest.zip

After quickly sifting through now, the only thing that jumps out immediately is many failed to connect messages in both of the files related to btcd.

Nsandomeno avatar Jun 20 '24 10:06 Nsandomeno

Thanks! Yeah, there's nothing that jumps out. Can you please upload the full output of the integration test itself when you run only the multi signature test? You can trigger that alone by just running make itest icase=multi_signature_on_all_levels btw, no need to comment out things.

guggero avatar Jun 20 '24 11:06 guggero

Nice to finally get that single run down here - thanks! Attached is that output dumped in a file.

If something is taking to long to respond it seems to be the second taprpc.NewAddrRequest that withdraws back to Alice. This feels odd since the same call is made earlier in the itest, but it looks like this call could have to collect additional data since auth material is not passed via the request.

I'm happy to help, but would need a push in the right direction. Also can provide additional details about my computer if you think it could be OS specific.

Original file wasn't saved here is the dump: output-from-make-icase.txt

Nsandomeno avatar Jun 20 '24 11:06 Nsandomeno

Hmm, are you running things on a very old or otherwise slow machine? It looks like it's just timing out because things take too long. It looks like just starting up the node almost takes 10 seconds...

guggero avatar Jun 20 '24 13:06 guggero

2020 MacBook Pro, 16 GB memory, I don't typically experience issues like that... and, currently have much less running than is typical. I also have plenty of disk space.

If it is simply a matter of a deadline, do you see anyway for me to configure that to be longer?

Nsandomeno avatar Jun 20 '24 13:06 Nsandomeno

You can bump this value, which is causing the new address creation to time out: https://github.com/lightninglabs/taproot-assets/blob/c972893214346bf4167e85e50ffbf7db4c8498d6/itest/test_harness.go#L66

But I think something's quite wrong here. Looking at this:

2024-06-20 07:29:59.409 [INF] TADB: 1/u macaroons (184.185699ms)
2024-06-20 07:29:59.696 [INF] TADB: 2/u assets (470.972884ms)
2024-06-20 07:29:59.997 [INF] TADB: 3/u addrs (771.805208ms)
2024-06-20 07:30:00.278 [INF] TADB: 4/u mssmt (1.053011165s)
2024-06-20 07:30:00.660 [INF] TADB: 5/u transfers (1.434461231s)
2024-06-20 07:30:01.055 [INF] TADB: 6/u addr_event (1.830356951s)
2024-06-20 07:30:01.292 [INF] TADB: 7/u universe (2.06696322s)
2024-06-20 07:30:01.448 [INF] TADB: 8/u universe_events_timestamp (2.222675209s)
2024-06-20 07:30:01.721 [INF] TADB: 9/u universe_configs (2.496170662s)
2024-06-20 07:30:01.962 [INF] TADB: 10/u universe_stats (2.736820453s)
2024-06-20 07:30:02.226 [INF] TADB: 11/u transfers (3.001343023s)
2024-06-20 07:30:02.505 [INF] TADB: 12/u anchor_txid (3.095945268s)
2024-06-20 07:30:02.770 [INF] TADB: 13/u universe_fed_proof_sync_log (3.073679374s)
2024-06-20 07:30:03.224 [INF] TADB: 14/u multiverse_tree (3.226987341s)
2024-06-20 07:30:03.466 [INF] TADB: 15/u asset_witnesses (3.187647965s)
2024-06-20 07:30:03.942 [INF] TADB: 16/u tapscript_trees (3.281604757s)
2024-06-20 07:30:04.223 [INF] TADB: 17/u seedling_script_group_keys (3.167564985s)
2024-06-20 07:30:04.478 [INF] TADB: 18/u script_key_declared_known (3.185782426s)
2024-06-20 07:30:04.829 [INF] TADB: 19/u managed_utxo_commitment_version (3.38130933s)

These migrations on an empty database taking multiple seconds each is very abnormal. They each take between 2 and 30 milliseconds on my machine. So this could be the MacOS fsync issue (see https://bonsaidb.io/blog/acid-on-apple/ for more details, the tl;dr is that MacOS lies about syncing the file system and data isn't actually safe until you do a full fsync which takes muuuch longer).

Just out of curiosity, if you set this to false (and the above synchronous to normal), do you see the problem go away?

guggero avatar Jun 20 '24 14:06 guggero

It did not no, unfortunately. It looks to be the same output, which is attached. And, here is the pragmaOptions content for your confirmation:

	pragmaOptions := []struct {
		name  string
		value string
	}{
		{
			name:  "foreign_keys",
			value: "on",
		},
		{
			name:  "journal_mode",
			value: "WAL",
		},
		{
			name:  "busy_timeout",
			value: "5000",
		},
		{
			// With the WAL mode, this ensures that we also do an
			// extra WAL sync after each transaction. The normal
			// sync mode skips this and gives better performance,
			// but risks durability.
			name:  "synchronous",
			value: "normal",
		},
		{
			// This is used to ensure proper durability for users
			// running on Mac OS. It uses the correct fsync system
			// call to ensure items are fully flushed to disk.
			name:  "fullfsync",
			value: "false",
		},
	}

icase-output-sqlite-settings-change.txt

For our purposes we can move past this for now, assuming it does have a sprawl effect of changes in the case its fix is important. If it is something you thing is very important, I'm happy to look into this more intensely.

Nsandomeno avatar Jun 20 '24 17:06 Nsandomeno

Hmm, okay, so things are slightly faster but not really fast enough. To be honest, I believe this could be an issue with your hardware. Because we have other team members that are also running on MacOS without any issues. So it can't be a general problem. Do you have another machine you could try it out to rule out something specific to your machine?

guggero avatar Jun 20 '24 17:06 guggero

I'm going to work with someone eventually to do that, but I appreciate your time looking in.

Nsandomeno avatar Jun 20 '24 17:06 Nsandomeno

One thing I'm still having a bit of a struggle understanding is who so much other success on other tests?

In the even we are running just this one itest case, shouldn't the timeout be running in step with the the few tests when running the entire suite OR is there anything about this test in specific straining this MacOS weakness?

Any thoughts here would be much appreciated @guggero !

Nsandomeno avatar Jun 21 '24 10:06 Nsandomeno

Have you been able to figure out what the issue here is? Can we close this?

guggero avatar Feb 12 '25 16:02 guggero

We can close it

On Wed, Feb 12, 2025 at 11:45 AM Oliver Gugger @.***> wrote:

Have you been able to figure out what the issue here is? Can we close this?

— Reply to this email directly, view it on GitHub https://github.com/lightninglabs/taproot-assets/issues/958#issuecomment-2654283666, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALYRKSKKT7HLBK3QH47UATD2PN3BJAVCNFSM6AAAAABXACWR3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNJUGI4DGNRWGY . You are receiving this because you authored the thread.Message ID: @.***>

Nsandomeno avatar Feb 12 '25 16:02 Nsandomeno