mina icon indicating copy to clipboard operation
mina copied to clipboard

Mina daemon crashes inside mina-rosetta container

Open remzaspecial opened this issue 2 years ago • 1 comments

Preliminary Checks

  • [X] This issue is not a duplicate. Before opening a new issue, please search existing issues: https://github.com/MinaProtocol/mina/issues
  • [X] This issue is not a question, feature request, RFC, or anything other than a bug report. Please post those things in GitHub Discussions: https://github.com/MinaProtocol/mina/discussions

Description

Снимок экрана 2022-08-26 в 13 19 10

I use this docker-compose file : version: '2' services: rpc-node: image: minaprotocol/mina-rosetta:1.3.1.1-f361ba1-focal volumes: - /mnt/HC_Volume_2803401/.mina-config:/root/.mina-config logging: driver: json-file options: max-size: 300m ports: - "10101:10101" - "3085:3085" - "3086:3086" - "3087:3087" entrypoint: ./docker-start.sh

Steps to Reproduce

  1. git clone mina-rosetta (https://github.com/MinaProtocol/mina/tree/develop/src/app/rosetta)
  2. docker-compose up -d ...

Expected Result

Fully sync rosetta

Actual Result

Daemon doesn't start

How frequently do you see this issue?

Always

What is the impact of this issue on your ability to run a node?

Blocker

Status

mina client status
Error: Unable to connect to Mina daemon.
- The daemon might not be running. See logs (in `~/.mina-config/mina.log`) for details under the host:127.0.0.1.
  Run `mina daemon -help` to see how to start daemon.
- If you just started the daemon, wait a minute for the RPC server to start.
- Alternatively, the daemon may not be running the RPC server on (127.0.0.1 8301).
  If so, add flag `---daemon-port` with correct port when running this command.

Additional information

log.txt there is complete log

remzaspecial avatar Aug 26 '22 10:08 remzaspecial

Снимок экрана 2022-08-26 в 13 55 35 Also i find it in logs

remzaspecial avatar Aug 26 '22 10:08 remzaspecial

@lk86 Are you actively working on this, or can the folks at Mina Foundation take this on?

@shimkiv You set the priority to 'low', but it seems that this problem, according to the reporter, happens always. This means that the Rosetta container is useless at present. Is the priority low because the container/image is less important than the main distribution of Mina, or for some other reason?

robinbb avatar Oct 12 '22 01:10 robinbb

No I have not actively been looking at this, you're welcome to dig in further. My gut says this is either not enough resources or bad permissions on the mounted directory, as the most unique error i see in the logs is a write error:

{"commit_id":"f361ba19d78ba930b5b551f58d1a82942c0f724b","sexp":["monitor.ml.Error",["Writer error from inner_monitor",["Unix.Unix_error","Broken pipe","writev_assume_fd_is_nonblocking",""],["writer",[["id","4"],["fd",[["file_descr","25"],["info",["child process",["stdin",["pid","386"],["prog","/proc/212/exe"],["args",["parallel-worker"]]],"src/process.ml:62:17"]],["kind","Fifo"],["supports_nonblock","true"],["have_set_nonblock","true"],["state",["Open","Empty"]],["watching",[["read","Not_watching"],["write","Not_watching"]]],["watching_has_changed","false"],["num_active_syscalls","0"],["close_finished","Empty"]]],["monitor",[[["name",["id","117"]],["here",[]],["id","117"],["has_seen_error","false"],["is_detached","false"]],[["name",""],["here",["src/lib/mina_lib/mina_lib.ml:1281:35"]],["id","102"],["has_seen_error","false"],["is_detached","true"]]]],["inner_monitor",[[["name",["id","118"]],["here",[]],["id","118"],["has_seen_error","true"],["is_detached","true"]],[["name",""],["here",["src/lib/mina_lib/mina_lib.ml:1281:35"]],["id","102"],["has_seen_error","false"],["is_detached","true"]]]],["background_writer_state","Stopped_permanently"],["background_writer_stopped",["Full",[]]],["syscall","Per_cycle"],["bytes_received","349"],["bytes_written","0"],["scheduled_bytes","0"],["scheduled_back","0"],["back","0"],["close_state","Open"],["close_finished","Empty"],["close_started","Empty"],["num_producers_to_flush_at_close","0"],["flush_at_shutdown_elt",["<opaque>"]],["check_buffer_age",[[["writer","<opaque>"],["maximum_age","2m"],["bytes_received_at_now_minus_maximum_age","0"],["bytes_received_queue",[]],["times_received_queue",[]],["bytes_seen","0"],["too_old","Empty"],["for_this_time_source",[["active_checks",["<opaque>"]],["closed","Empty"]]]]]],["consumer_left",["Full",[]]],["raise_when_consumer_leaves","true"],["open_flags",["Full",["Ok",["wronly"]]]],["line_ending","Unix"]]]],["Caught by monitor (id 117)"]],"backtrace":["Raised at Base__Result.ok_exn in file \"src/result.ml\", line 187, characters 17-26","Called from Async_kernel__Deferred1.M.map.(fun) in file \"src/deferred1.ml\", line 17, characters 40-45","Called from Async_kernel__Job_queue.run_job in file \"src/job_queue.ml\" (inlined), line 128, characters 2-5","Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 168, characters 6-47"]},"pid":212}}

lk86 avatar Oct 25 '22 06:10 lk86

Okay. I will assign to @kaozenn with @kantp being made aware in case he wants to re-assign.

To be clear - the reason for this being higher priority than "low" is the (claimed) reproducibility of the error. Please investigate.

robinbb avatar Oct 25 '22 21:10 robinbb

I can also reproduce the error using the image 1.3.1.1-f361ba1-focal mentioned in the description as well as 1.3.1.2-25388a0-focal. I doubt this is related to resources or permissions issues. For my tests, I have mounted a local volume to store psql data. I can see the local folder populated correctly with psql data. The issue occurs using both devnet and mainnet and the mina daemon crashes at the same time with the following log

rosetta-rpc-node-1  | {"timestamp":"2022-10-27 05:17:16.960771Z","level":"Fatal","source":{"module":"Init__Coda_run","location":"File \"src/app/cli/src/init/coda_run.ml\", line 615, characters 2-26"},"message":"Unhandled top-level exception: $exn\nGenerating crash report","metadata":{"exn":{"commit_id":"f361ba19d78ba930b5b551f58d1a82942c0f724b","sexp":["monitor.ml.Error",["Writer error from inner_monitor",["Unix.Unix_error","Broken pipe","writev_assume_fd_is_nonblocking",""],["writer",[["id","4"],["fd",[["file_descr","25"],["info",["child process",["stdin",["pid","451"],["prog","/proc/207/exe"],["args",["parallel-worker"]]],"src/process.ml:62:17"]],["kind","Fifo"],["supports_nonblock","true"],["have_set_nonblock","true"],["state",["Open","Empty"]],["watching",[["read","Not_watching"],["write","Not_watching"]]],["watching_has_changed","false"],["num_active_syscalls","0"],["close_finished","Empty"]]],["monitor",[[["name",["id","164"]],["here",[]],["id","164"],["has_seen_error","false"],["is_detached","false"]],[["name",""],["here",["src/lib/mina_lib/mina_lib.ml:1281:35"]],["id","149"],["has_seen_error","false"],["is_detached","true"]]]],["inner_monitor",[[["name",["id","165"]],["here",[]],["id","165"],["has_seen_error","true"],["is_detached","true"]],[["name",""],["here",["src/lib/mina_lib/mina_lib.ml:1281:35"]],["id","149"],["has_seen_error","false"],["is_detached","true"]]]],["background_writer_state","Stopped_permanently"],["background_writer_stopped",["Full",[]]],["syscall","Per_cycle"],["bytes_received","349"],["bytes_written","0"],["scheduled_bytes","0"],["scheduled_back","0"],["back","0"],["close_state","Open"],["close_finished","Empty"],["close_started","Empty"],["num_producers_to_flush_at_close","0"],["flush_at_shutdown_elt",["<opaque>"]],["check_buffer_age",[[["writer","<opaque>"],["maximum_age","2m"],["bytes_received_at_now_minus_maximum_age","0"],["bytes_received_queue",[]],["times_received_queue",[]],["bytes_seen","0"],["too_old","Empty"],["for_this_time_source",[["active_checks",["<opaque>"]],["closed","Empty"]]]]]],["consumer_left",["Full",[]]],["raise_when_consumer_leaves","true"],["open_flags",["Full",["Ok",["wronly"]]]],["line_ending","Unix"]]]],["Caught by monitor (id 164)"]],"backtrace":["Raised at Base__Result.ok_exn in file \"src/result.ml\", line 187, characters 17-26","Called from Async_kernel__Deferred1.M.map.(fun) in file \"src/deferred1.ml\", line 17, characters 40-45","Called from Async_kernel__Job_queue.run_job in file \"src/job_queue.ml\" (inlined), line 128, characters 2-5","Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 168, characters 6-47"]},"pid":207}}

kaozenn avatar Oct 27 '22 05:10 kaozenn

The issues seems to be linked to running the container on an aarch64-darwin platform. I keep investigating the resolution of this issue.

kaozenn avatar Oct 31 '22 22:10 kaozenn

Closing this issue as it will be addressed in https://github.com/MinaProtocol/mina/issues/12089

kaozenn avatar Nov 01 '22 23:11 kaozenn