leiwen83 comments

Results 39 comments of


                                            leiwen83

关于训练开销

目前多机并行的时候，deepspeed用的管理器是pdsh还是用mpi？有遇到因为网络问题导致的不稳定问题吗？

[Bug]: NameError: name 'vllm_ops' is not defined

tests/ folder also suffer from this vllm_ops not defined issue. And I create this PR for pytest, https://github.com/vllm-project/vllm/pull/4231, which force pytest to search module from installed place.

add spec infer related into prometheus metrics.

@cadedaniel I submit a rebased PR, which keep the concat logic as before. num_spec is made to aggregate "k" number.

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

Try qwen2 multi-gpu support patch https://github.com/mlc-ai/mlc-llm/pull/1985 with latest code: https://github.com/mlc-ai/mlc-llm/commit/ae97b8d3763cd9ef9179140027d206622d185d21 But got below error when compile model. ``` File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py",...

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

After revert to a0484bd53854a508283be47d62b704b2c737259d, with https://github.com/mlc-ai/mlc-llm/pull/1985 still get cuda oom for 72B int4 with 4gpus

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

@tlopex any idea?

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

> @leiwen83 Are you running without quantization? 72B model doesn't seem to fit four 3090 with each having 24GB vRAM. I am running with quantization from HF model: Here is...

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

I find there is setting named "MLC_INTERNAL_PRESHARD_NUM". Do I need to set this when do the convert model and before serving?

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

Still get error... ``` Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/data/tmp/test/llm/mlc-llm/python/mlc_llm/serve/server/__main__.py", line...

[Question] how to serve 72B Qwen1.5 into 4x3090 gpu?

After pull latest commit, it would report error when converting the model. ``` # python3 -m mlc_llm convert_weight --quantization q4f16_1 Qwen1.5-72B-Chat --output Qwen1.5-72B-Chat_tvm [2024-03-28 14:47:03] INFO auto_config.py:115: Found model configuration:...