vidur icon indicating copy to clipboard operation
vidur copied to clipboard

Error: can't register atexit after shutdown

Open rajeshitshoulders opened this issue 1 year ago • 6 comments

Hi, could you please help with resolve below issue for IPython.core.display module

Setup mamba virtual env: /home/idps/vidur/vidur-venv I configured wandb and set to variable WANDB_BASE_URL to local web server with API key.

Please let me know if you need any additional information

(/home/idps/vidur/vidur-venv) idps@smc-gpu-03:~/vidur$ python -m vidur.main --replica_config_device a100 --replica_config_model_name meta-llama/Llama-2-7b-hf --cluster_config_num_replicas 1 --replica_config_tensor_parallel_size 1 --replica_config_num_pipeline_stages 1 --request_generator_config_type synthetic --length_generator_config_type trace --interval_generator_config_type static --trace_request_length_generator_config_max_tokens 4096 --trace_request_length_generator_config_trace_file ./data/processed_traces/arxiv_summarization_stats_llama2_tokenizer_filtered_v2.csv --synthetic_request_generator_config_num_requests 128 --replica_scheduler_config_type vllm --vllm_scheduler_config_batch_size_cap 256 --vllm_scheduler_config_max_tokens_in_batch 4096 --metrics_config_wandb_project idps-wandb INFO 09-25 12:47:13 trace_request_length_generator.py:78] Loaded request length trace file ./data/processed_traces/arxiv_summarization_stats_llama2_tokenizer_filtered_v2.csv with 28257 requests INFO 09-25 12:47:15 simulator.py:60] Starting simulation with cluster: Cluster({'id': 0, 'num_replicas': 1}) and 128 requests INFO 09-25 12:47:15 simulator.py:80] Simulation ended at: 92.29293720617318s INFO 09-25 12:47:15 simulator.py:83] Writing output Error importing optional module IPython.core.display Traceback (most recent call last): File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/_plotly_utils/optional_imports.py", line 28, in get_module return import_module(name) File "/home/idps/vidur/vidur-venv/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 992, in _find_and_load_unlocked File "", line 241, in _call_with_frames_removed File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 992, in _find_and_load_unlocked File "", line 241, in _call_with_frames_removed File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/IPython/init.py", line 55, in from .terminal.embed import embed File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/IPython/terminal/embed.py", line 16, in from IPython.terminal.interactiveshell import TerminalInteractiveShell File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/IPython/terminal/interactiveshell.py", line 48, in from .debugger import TerminalPdb, Pdb File "/home/idps/vidur/vidur-venv/lib/python3.10/site-packages/IPython/terminal/debugger.py", line 18, in from concurrent.futures import ThreadPoolExecutor File "", line 1075, in _handle_fromlist File "/home/idps/vidur/vidur-venv/lib/python3.10/concurrent/futures/init.py", line 49, in getattr from .thread import ThreadPoolExecutor as te File "/home/idps/vidur/vidur-venv/lib/python3.10/concurrent/futures/thread.py", line 37, in threading._register_atexit(_python_exit) File "/home/idps/vidur/vidur-venv/lib/python3.10/threading.py", line 1504, in _register_atexit raise RuntimeError("can't register atexit after shutdown") RuntimeError: can't register atexit after shutdown INFO 09-25 12:47:18 simulator.py:86] Metrics written INFO 09-25 12:47:18 simulator.py:95] Chrome event trace written

rajeshitshoulders avatar Sep 25 '24 19:09 rajeshitshoulders

I run into the exact same issue!

ozcanmiraay avatar Sep 29 '24 10:09 ozcanmiraay

Also running into the same issue

akaashrp avatar Oct 09 '24 18:10 akaashrp

Hi @rajeshitshoulders @akaashrp and @ozcanmiraay, Can you please try removing jupyterlab dependency from environment.yml. You'll need to create a new mamba environment with the changed environment.yml file. IPython is used by jupyterlab. The later is only used in the experiment notebooks not the actual simulator code. So, we can remove jupyterlab and that might remove the error.

nitinkedia7 avatar Oct 10 '24 07:10 nitinkedia7

Got the same error, and still failed without jupyterlab installed @nitinkedia7

Yogaht avatar Oct 18 '24 09:10 Yogaht

Got the same error, and still failed without jupyterlab installed @nitinkedia7

I manually uninstall the IPython, there is no error anymore, but I can only get the request_metrics.csv under the output dir without any errors occur, and the running log just ends with Writing output, and I found that any fig.write_image will get stucked so I can only comment out the code now.

Yogaht avatar Oct 18 '24 10:10 Yogaht

Hi, this seems to be caused by kaleido, downgrading to 0.2.1 fixed the problem for me. pip install -U kaleido==0.2.1

vladandrew avatar Nov 22 '24 22:11 vladandrew

Hi @rajeshitshoulders @vladandrew @Yogaht @akaashrp, Vidur now uses seaborn (based on matplotlib) instead of plotly (which uses kaleido). This removes dependency on kaliedo and hence the original error should not resurface. Also, please checkout the recent major release of Vidur https://github.com/microsoft/vidur/pull/56.

nitinkedia7 avatar Jul 07 '25 21:07 nitinkedia7

Hi @rajeshitshoulders @vladandrew @Yogaht @akaashrp, Vidur now uses seaborn (based on matplotlib) instead of plotly (which uses kaleido). This removes dependency on kaliedo and hence the original error should not resurface. Also, please checkout the recent major release of Vidur #56.

Hi @nitinkedia7 I also encounter this issue with the main branch.

conda env create -f environment-dev.yml pip install -r requirement.txt pip install -r requirement_dev.txt

python -m vidur.main

and downupgrading kaleido works for me.

galeselee avatar Sep 17 '25 14:09 galeselee