exchain icon indicating copy to clipboard operation
exchain copied to clipboard

Error with docker image

Open breezytm opened this issue 4 years ago • 13 comments

Error when deploying full node with docker image

1. Describe

The following error occursConnecting to raw.githubusercontent.com (185.199.111.133:443) wget: can't open '/root/.exchaind/config/genesis.json': No such file or directory /root/start.sh: line 11: 10 Illegal instruction (core dumped) exchaind start --chain-id exchain-66 --rest.laddr tcp://0.0.0.0:8545 --db_backend rocksdb

docker parameters docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

For Admin Use

  • [ ] Not duplicate issue
  • [ ] Appropriate labels applied
  • [ ] Appropriate contributors tagged
  • [ ] Contributor assigned/self-assigned

breezytm avatar Feb 15 '22 16:02 breezytm

same here, not able to run docker image

suyog-bhat avatar Jul 22 '22 13:07 suyog-bhat

docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind:/root/.exchaind -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

make sure you have config and data directories in your exchaind directory

cwbhhjl avatar Sep 06 '22 11:09 cwbhhjl

```shell
docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind:/root/.exchaind -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

make sure you have config and data directories in your exchaind directory

I have tried initialising both directories, populating genesis, and priv_validator_state.json, but it still crashes with Illegal instruction (core dumped)

neoromantique avatar Oct 17 '22 07:10 neoromantique

@neoromantique

Can you redo the deployment following this thread? If there is an error in the deployment please tell me which step went wrong and what is the specific error

https://forum.okt.club/d/299-how-to-start-a-mainnet-node

cwbhhjl avatar Oct 17 '22 11:10 cwbhhjl

@neoromantique

Can you redo the deployment following this thread? If there is an error in the deployment please tell me which step went wrong and what is the specific error

https://forum.okt.club/d/299-how-to-start-a-mainnet-node

I cannot even execute exchaind init from within docker. And building it for my host defeats the point of docker image in the first place (And I think wouldn't help anyway).

neoromantique avatar Oct 18 '22 15:10 neoromantique

@neoromantique try this

  1. mkdir ~/okc
  2. cd ~/okc
  3. curl -O https://okg-pub-hk.oss-cn-hongkong.aliyuncs.com/cdn/oec/snapshot/mainnet-s0-20221018-14723313-rocksdb.tar.gz
  4. tar zxvf mainnet-s0-20221018-14723313-rocksdb.tar.gz
  5. docker run -d --name exchain-mainnet-fullnode -v ~/okc/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

cwbhhjl avatar Oct 19 '22 01:10 cwbhhjl

@neoromantique try this

  1. mkdir ~/okc
  2. cd ~/okc
  3. curl -O https://okg-pub-hk.oss-cn-hongkong.aliyuncs.com/cdn/oec/snapshot/mainnet-s0-20221018-14723313-rocksdb.tar.gz
  4. tar zxvf mainnet-s0-20221018-14723313-rocksdb.tar.gz
  5. docker run -d --name exchain-mainnet-fullnode -v ~/okc/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

Same exact output.

root@hostname ~/okc # ls -la
total 31175916
drwxr-xr-x 3 root root          81 Oct 19 20:20 .
drwx------ 8 root root         269 Oct 19 16:02 ..
drwx------ 7 root root         161 Oct 17 20:33 data
-rw-r--r-- 1 root root 31924135195 Oct 19 17:19 mainnet-s0-20221018-14723313-rocksdb.tar.gz
root@ hostname ~/okc # docker logs --tail 100 -f 25d
/root/start.sh: line 6:     7 Illegal instruction     (core dumped) exchaind init fullnode --chain-id exchain-66
Connecting to raw.githubusercontent.com (185.199.109.133:443)
wget: can't open '/root/.exchaind/config/genesis.json': No such file or directory
/root/start.sh: line 11:    10 Illegal instruction     (core dumped) exchaind start --chain-id exchain-66 --rest.laddr tcp://0.0.0.0:8545 --db_backend rocksdb
root@hostname ~/okc # 

neoromantique avatar Oct 19 '22 18:10 neoromantique

@neoromantique https://stackoverflow.com/questions/54698812/illegal-instruction-core-dumped-when-trying-to-execute-elf-file

It means the compiled binary contains an instruction(possibly more than one instruction) that's not valid on the architecture where you're running it.

Based on this post and other related posts on stackoverflow, I'm guessing it might be a hardware issue.

You can run your binary under gdb to find out specific instruction: gdb ./precompiled (gdb) run (gdb) bt (gdb) disassemble Then type run and then when it fails, run bt (backtrace) to see where it fails. Use disassemble to see the specific instruction that's causing the failure.

Can you try this or try running okc on another machine?

cwbhhjl avatar Oct 20 '22 03:10 cwbhhjl

I'm running it on AMD Ryzen 9 5950X, it's fairly standard and modern hardware.

https://gist.github.com/neoromantique/ab52f80e31a4a4df70bd0b744f870275

neoromantique avatar Oct 20 '22 16:10 neoromantique

I'm running it on AMD Ryzen 9 5950X, it's fairly standard and modern hardware.

https://gist.github.com/neoromantique/ab52f80e31a4a4df70bd0b744f870275

@neoromantique

Program received signal SIGILL, Illegal instruction.
0x0000000001dedb56 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_Hashtable<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> const*> (this=0x453b760 <rocksdb::(anonymous namespace)::sc_wrapper_type_info>, __f=0x7fffffffe460, __l=0x7fffffffe4f8, __bkt_count_hint=0, __h1=..., __h2=..., 
    __h=..., __eq=..., __exk=..., __a=...) at /usr/include/c++/10.3.1/bits/stl_iterator_base_funcs.h:138
   0x0000000001dedb49 <+121>:   movq   $0x0,0x10(%rdi)
   0x0000000001dedb51 <+129>:   vmovq  %rax,%xmm0
=> 0x0000000001dedb56 <+134>:   vpmaxuq %xmm1,%xmm0,%xmm0
   0x0000000001dedb5c <+140>:   vmovq  %xmm0,%rsi

We can see that the instruction causing the error is vpmaxuq.

https://www.officedaytime.com/simd512e/ https://en.wikipedia.org/wiki/AVX-512

It looks like vpmax is an AVX512 instruction, and Ryzen doesn't support it.

https://www.quora.com/Does-Ryzen-support-AVX

The error comes from rocksdb, I think we can try by recompiling rocksdb on your machine.

  1. cd ~
  2. git clone -b v1.6.3 https://github.com/okex/exchain.git
  3. cd exchain
  4. make rocksdb
  5. make mainnet
  6. exchaind init okc-mainnet-node --chain-id exchain-66 --home ~/.exchaind

If an error occurs in the step of make rocksdb, please compile rocksdb with version 6.27.3 according to the official documents. https://github.com/facebook/rocksdb

cwbhhjl avatar Oct 21 '22 01:10 cwbhhjl

@neoromantique Has your problem been resolved?

cwbhhjl avatar Oct 25 '22 01:10 cwbhhjl

@neoromantique Has your problem been resolved?

Well, kinda. I've used my own Dockerfile based on Ubuntu to build the rocksdb and exchain, after that it works fine, even with rocksdb.

neoromantique avatar Oct 26 '22 00:10 neoromantique