sui icon indicating copy to clipboard operation
sui copied to clipboard

[Bug] Narwhal benchmark run failed Due to wrong port assignment mechanism

Open FullyRobert opened this issue 2 years ago • 1 comments

Steps to Reproduce Issue

When I try to assign 2 workers to each primary in the narwhal benchmark, the benchmark fails to run.

  1. Modify the param 'worker' to 2 https://github.com/MystenLabs/sui/blob/f0a2936c1dc68c1d450834813f59ae074edf7582/narwhal/benchmark/fabfile.py#L22
  2. Run fab local

Expected Result

The benchmark runs normally and outputs results

Actual Result

Benchmark fails to run. In the log of the second worker of each primary, we find that the admin server address is incorrectly set to 127.0.0.1:1 , which is a port reserved by the operating system and cannot be used.

2022-12-12T02:10:00.479401229Z INFO narwhal_network::admin: starting admin server address=127.0.0.1:1

That is because the assignment of the port number of each worker is derived from the execution of check_add by base_port (Set to 0 in benchmark ) in the following code:

https://github.com/MystenLabs/sui/blob/f0a2936c1dc68c1d450834813f59ae074edf7582/narwhal/worker/src/worker.rs#L267-L271

Changing the base_port to a non-reserved port (>1024) cannot not solve this problem, because the base_port of the workers of different primaries is the same and the conflict cannot be resolved (port 0 can be reused, as the actual port will be assigned by os)

The CheckAdd port allocation mechanism also has a small chance of conflict in production, maybe the right way is to request an available port from the system when configuring the port.

System Information

  • OS: Ubuntu 20.04
  • Compiler: rustc 1.65.0 (897e37553 2022-11-02)

FullyRobert avatar Dec 12 '22 02:12 FullyRobert

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Feb 11 '23 02:02 github-actions[bot]