node icon indicating copy to clipboard operation
node copied to clipboard

linux/arm64 compatibility seems broken

Open WinterNis opened this issue 3 years ago • 6 comments

Hello 👋

I have recently tried to run akash binary on linux/arm64 architecture (version v.0.14.1) and was facing panic issues when running the binary. You can find below the steps to reproduces and my humble investigation.

How to reproduce ?

docker run -ti --rm --platform linux/arm64 ubuntu

# In container
apt-get update && apt-get install curl wget unzip
wget https://github.com/ovrclk/akash/releases/download/v0.14.1/akash_0.14.1_linux_arm64.zip && unzip akash_0.14.1_linux_arm64.zip
curl -s "https://raw.githubusercontent.com/ovrclk/net/master/mainnet/genesis.json" > $HOME/.akash/config/genesis.json

./akash_0.14.1_linux_arm64/akash start

Note: I ran the docker command from a 2020 M1 macbook, on Monterey. With docker desktop you can use multi architecture support with --platform, which uses qemu on the background to run other arch than your host. (linux/arm64 instead of darwin/arm64 here). I first had the panic issue from an AWS graviton instance, which is linux/arm64, and then reproduced it locally thanks to docker multi-arch support.

Output:

7:47AM INF starting ABCI with Tendermint
7:47AM INF Starting multiAppConn service impl=multiAppConn module=proxy
7:47AM INF Starting localClient service connection=query impl=localClient module=abci-client
7:47AM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
7:47AM INF Starting localClient service connection=mempool impl=localClient module=abci-client
7:47AM INF Starting localClient service connection=consensus impl=localClient module=abci-client
7:47AM INF Starting EventBus service impl=EventBus module=events
7:47AM INF Starting PubSub service impl=PubSub module=pubsub
unexpected fault address 0x5f6c61697486b4
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x5f6c61697486b4 pc=0xb70550]

goroutine 82 [running]:
runtime.throw(0x24de26a, 0x5)
        runtime/panic.go:1117 +0x54 fp=0x4000f49590 sp=0x4000f49560 pc=0x434c14
runtime.sigpanic()
        runtime/signal_unix.go:741 +0x230 fp=0x4000f495d0 sp=0x4000f49590 pc=0x44c7c0
github.com/golang/snappy.encodeBlock(0x4007af8004, 0xcba544, 0xcba544, 0x40048e2000, 0x10000, 0xae8dec, 0x4000078701)
        github.com/golang/[email protected]/encode_arm64.s:666 +0x360 fp=0x4000f51670 sp=0x4000f495e0 pc=0xb70550
github.com/golang/snappy.Encode(0x4007af8000, 0xcba548, 0xcba548, 0x40048f2000, 0xad8d8c, 0xad8dec, 0x4000667ce0, 0x4000667ce0, 0x4000078788)
        github.com/golang/[email protected]/encode.go:39 +0x17c fp=0x4000f516c0 sp=0x4000f51670 pc=0xb6fa8c
github.com/syndtr/goleveldb/leveldb/table.(*Writer).writeBlock(0x4000f1a6c0, 0x4000f1a718, 0x2, 0x0, 0x12, 0x18, 0x4000667ce0)
        github.com/syndtr/[email protected]/leveldb/table/writer.go:171 +0xb0 fp=0x4000f51740 sp=0x4000f516c0 pc=0xb78270
github.com/syndtr/goleveldb/leveldb/table.(*Writer).finishBlock(0x4000f1a6c0, 0x4003df8000, 0x12)
        github.com/syndtr/[email protected]/leveldb/table/writer.go:222 +0x4c fp=0x4000f51790 sp=0x4000f51740 pc=0xb7870c
github.com/syndtr/goleveldb/leveldb/table.(*Writer).Append(0x4000f1a6c0, 0x4003df8000, 0x12, 0xaea000, 0x4003df8012, 0xae8d6c, 0xae9fee, 0x18, 0x4000f1a6c0)
        github.com/syndtr/[email protected]/leveldb/table/writer.go:255 +0x1e4 fp=0x4000f51800 sp=0x4000f51790 pc=0xb78974
github.com/syndtr/goleveldb/leveldb.(*tWriter).append(0x40001a33e0, 0x4003df8000, 0x12, 0xaea000, 0x4003df8012, 0xae8d6c, 0xae9fee, 0x4000cfc700, 0x400006e800)
        github.com/syndtr/[email protected]/leveldb/table.go:559 +0xbc fp=0x4000f51870 sp=0x4000f51800 pc=0xb9a4dc
github.com/syndtr/goleveldb/leveldb.(*tOps).createFrom(0x4001161290, 0x2ae2fb8, 0x4000cfc700, 0x0, 0x0, 0x0, 0x0)
        github.com/syndtr/[email protected]/leveldb/table.go:397 +0x11c fp=0x4000f51910 sp=0x4000f51870 pc=0xb9985c
github.com/syndtr/goleveldb/leveldb.(*session).flushMemdb(0x4000c96ff0, 0x4000d2edc0, 0x4000168a80, 0x0, 0x0, 0x0, 0x0)
        github.com/syndtr/[email protected]/leveldb/session_compaction.go:35 +0xa4 fp=0x4000f51a70 sp=0x4000f51910 pc=0xb91ad4
github.com/syndtr/goleveldb/leveldb.(*DB).memCompaction.func1(0x4000124358, 0x20d5b80, 0x40001ba101)
        github.com/syndtr/[email protected]/leveldb/db_compaction.go:305 +0x64 fp=0x4000f51af0 sp=0x4000f51a70 pc=0xb9e9a4
github.com/syndtr/goleveldb/leveldb.(*compactionTransactFunc).run(0x4000d68e80, 0x4000124358, 0x0, 0xbff)
        github.com/syndtr/[email protected]/leveldb/db_compaction.go:242 +0x34 fp=0x4000f51b20 sp=0x4000f51af0 pc=0xb82724
github.com/syndtr/goleveldb/leveldb.(*DB).compactionTransact(0x4000c34540, 0x24e7ca0, 0xb, 0x2a80ed8, 0x4000d68e80)
        github.com/syndtr/[email protected]/leveldb/db_compaction.go:186 +0x1d0 fp=0x4000f51d50 sp=0x4000f51b20 pc=0xb82040
github.com/syndtr/goleveldb/leveldb.(*DB).compactionTransactFunc(...)
        github.com/syndtr/[email protected]/leveldb/db_compaction.go:253
github.com/syndtr/goleveldb/leveldb.(*DB).memCompaction(0x4000c34540)
        github.com/syndtr/[email protected]/leveldb/db_compaction.go:303 +0x324 fp=0x4000f51f10 sp=0x4000f51d50 pc=0xb82ce4
github.com/syndtr/goleveldb/leveldb.(*DB).mCompaction(0x4000c34540)
        github.com/syndtr/[email protected]/leveldb/db_compaction.go:777 +0x64 fp=0x4000f51fd0 sp=0x4000f51f10 pc=0xb85e14
runtime.goexit()
        runtime/asm_arm64.s:1130 +0x4 fp=0x4000f51fd0 sp=0x4000f51fd0 pc=0x46c0c4
created by github.com/syndtr/goleveldb/leveldb.openDB
        github.com/syndtr/[email protected]/leveldb/db.go:156 +0x464

goroutine 1 [runnable]:
syscall.Syscall6(0x4f, 0xffffffffffffff9c, 0x400056a810, 0x4000f0a038, 0x100, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x2)
        syscall/asm_linux_arm64.s:35 +0x10
syscall.Fstatat(0xffffffffffffff9c, 0x4000f72540, 0x25, 0x4000f0a038, 0x100, 0x0, 0x0)
        syscall/zsyscall_linux_arm64.go:1093 +0xa8
syscall.Lstat(...)
        syscall/syscall_linux_arm64.go:58
os.lstatNolog.func1(...)
        os/stat_unix.go:45
os.ignoringEINTR(...)
        os/file_posix.go:245
os.lstatNolog(0x4000f72540, 0x25, 0x0, 0x0, 0x0, 0x1a4)
        os/stat_unix.go:44 +0x70
os.Lstat(0x4000f72540, 0x25, 0x4000699dd8, 0xb59cbc, 0x40001b2e40, 0x0)
        os/stat.go:22 +0x44
os.rename(0x4000f72630, 0x27, 0x4000f72540, 0x25, 0x20, 0x1a4)
        os/file_unix.go:22 +0x30
os.Rename(...)
        os/file.go:348
github.com/syndtr/goleveldb/leveldb/storage.rename(...)
        github.com/syndtr/[email protected]/leveldb/storage/file_storage_unix.go:63
github.com/syndtr/goleveldb/leveldb/storage.(*fileStorage).setMeta(0x40001b6380, 0x1, 0x0, 0xb97550, 0x4000699fc8)
        github.com/syndtr/[email protected]/leveldb/storage/file_storage.go:267 +0x33c
github.com/syndtr/goleveldb/leveldb/storage.(*fileStorage).SetMeta(0x40001b6380, 0x1, 0x0, 0x0, 0x0)
        github.com/syndtr/[email protected]/leveldb/storage/file_storage.go:292 +0xf4
github.com/syndtr/goleveldb/leveldb.(*session).newManifest(0x4000c973b0, 0x40001bc140, 0x0, 0x0, 0x0)
        github.com/syndtr/[email protected]/leveldb/session_util.go:456 +0x4d0
github.com/syndtr/goleveldb/leveldb.(*session).create(...)
        github.com/syndtr/[email protected]/leveldb/session.go:125
github.com/syndtr/goleveldb/leveldb.Open(0x2adfb20, 0x40001b6380, 0x0, 0x0, 0x2a5a4a8, 0x4000118030)
        github.com/syndtr/[email protected]/leveldb/db.go:194 +0x1d8
github.com/syndtr/goleveldb/leveldb.OpenFile(0x40001993e0, 0x1d, 0x0, 0x40001993e0, 0x1d, 0x400114d420)
        github.com/syndtr/[email protected]/leveldb/db.go:225 +0x7c
github.com/tendermint/tm-db.NewGoLevelDBWithOpts(0x24e359c, 0x8, 0x400114e1c8, 0x11, 0x0, 0x10, 0x4000092b60, 0xd)
        github.com/tendermint/[email protected]/goleveldb.go:32 +0xa4
github.com/tendermint/tm-db.NewGoLevelDB(...)
        github.com/tendermint/[email protected]/goleveldb.go:27
github.com/tendermint/tm-db.init.0.func1(0x24e359c, 0x8, 0x400114e1c8, 0x11, 0x4000092be8, 0x400114e101, 0x11, 0x0)
        github.com/tendermint/[email protected]/goleveldb.go:15 +0x44
github.com/tendermint/tm-db.NewDB(0x24e359c, 0x8, 0x4000cc25d0, 0x9, 0x400114e1c8, 0x11, 0x4000000180, 0xa219fc, 0x400069a458, 0xa1f9ec)
        github.com/tendermint/[email protected]/db.go:64 +0x2b4
github.com/tendermint/tendermint/node.DefaultDBProvider(0x4000eed0c8, 0x4000eed0c8, 0x2, 0x2, 0x19)
        github.com/tendermint/[email protected]/node/node.go:69 +0xb0
github.com/tendermint/tendermint/node.createAndStartIndexerService(0x400068b180, 0x27e2fa8, 0x4000f0c230, 0x2abe390, 0x4001131080, 0x4000674b60, 0x0, 0x0, 0xa, 0x1, ...)
        github.com/tendermint/[email protected]/node/node.go:259 +0x30c
github.com/tendermint/tendermint/node.NewNode(0x400068b180, 0x2aaaa30, 0x4000d2f4a0, 0x4000540720, 0x2a5be08, 0x4000eecb10, 0x40005407b0, 0x27e2fa8, 0x40005408c0, 0x2abe390, ...)
        github.com/tendermint/[email protected]/node/node.go:669 +0x1d8
github.com/cosmos/cosmos-sdk/server.startInProcess(0x4000e4b720, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2accaf0, 0x4000edc350, ...)
        github.com/cosmos/[email protected]/server/start.go:244 +0x3f4
github.com/cosmos/cosmos-sdk/server.StartCmd.func2(0x4000d2bb80, 0x3da6ae0, 0x0, 0x0, 0x0, 0x0)
        github.com/cosmos/[email protected]/server/start.go:120 +0x144
github.com/spf13/cobra.(*Command).execute(0x4000d2bb80, 0x3da6ae0, 0x0, 0x0, 0x4000d2bb80, 0x3da6ae0)
        github.com/spf13/[email protected]/command.go:850 +0x320
github.com/spf13/cobra.(*Command).ExecuteC(0x40001e62c0, 0x24ddab8, 0x5, 0x4000e7f470)
        github.com/spf13/[email protected]/command.go:958 +0x268
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/[email protected]/command.go:895
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/[email protected]/command.go:888
github.com/ovrclk/akash/cmd/akash/cmd.Execute(0x40001e62c0, 0x2ad2348, 0x400000e420)
        github.com/ovrclk/akash/cmd/akash/cmd/root.go:117 +0x230
main.main()
        github.com/ovrclk/akash/cmd/akash/main.go:14 +0x24

[REDACTED]

In the exact same container on amd64 architecture (--platform=amd64), the issue is non-existent.

This highly similar to #1206, which was running version v0.12.1

Investigation

The issues seems to arise from the golang/snappy dependency. After digging a bit, I stumbled upon this assembly issue on snappy. Issue is known and should be fixed by using snappy > v0.0.3

Snappy is actually used by tendermint db package. Dependency graph: akash v0.14.1 -> github.com/tendermint/tm-db v0.6.4 -> github.com/syndtr/goleveldb v1.0.1-0.20200815110645-5c35d600f0ca -> github.com/golang/snappy v0.0.1

Note that github.com/tendermint/tendermint also uses github.com/tendermint/tm-db

Conclusion

I hope that the investigation is correct.

I will probably raise an issue on github.com/tendermint/tm-db side, to see if it’s possible to update the snappy dependency, since master is still using v0.0.1

But I am raising it here also because akash README.md states that binary is compatible with linux/arm64, which is not true at the moment.

WinterNis avatar Jan 20 '22 09:01 WinterNis

we encountered this with gaia and the fix was upstream:

https://github.com/cosmos/gaia/issues/862

faddat avatar Jan 20 '22 14:01 faddat

I can't see it in your post, can you please tell us the hardware you are running on? ARM is a very large family.

hydrogen18 avatar Jan 21 '22 17:01 hydrogen18

we encountered this with gaia and the fix was upstream:

cosmos/gaia#862

Thanks for the insight ! Did not know about that. That makes it relevant to have it fixed on tm-db side directly though.

WinterNis avatar Jan 24 '22 07:01 WinterNis

I can't see it in your post, can you please tell us the hardware you are running on? ARM is a very large family.

My bad. I have added the following note on the issue description:

Note: I ran the docker command from a 2020 M1 macbook, on Monterey. With docker desktop you can use multi architecture support with --platform, which uses qemu on the background to run other arch than your host. (linux/arm64 instead of darwin/arm64 here). I first had the panic issue from an AWS graviton instance, which is linux/arm64, and then reproduced it locally thanks to docker multi-arch support.

WinterNis avatar Jan 24 '22 07:01 WinterNis

I had the same issue on on AWS t4g.large which is a linux/arm64 graviton instance using v0.14.0

mofhusseini avatar Jan 26 '22 21:01 mofhusseini

Appreciate all the feedback on this, we'll have to block off some time to perform our own tests on different ARM platforms in the future.

hydrogen18 avatar Jan 26 '22 22:01 hydrogen18