su-park comments

Repositories
Issues
Comments

Results 4 comments of


                                            su-park

ray OOM in tensor parallel

I encountered the same oom error message and I guess there is still no other solution.. - model: Llama-2-7b - cuda version: 12.2 - vllm version: 0.3.0 - multi gpus...

ray OOM in tensor parallel

I resolved my case by `enforce_eager=True` with slower generations. Thank you all.

There are some issues with fine-tuning here

Even though I replaced `adapter_model.bin` with a checkpoint binary as @kuan-cresta mentioned, there have been some improvements, but the same issues persist. Do you have any more suggestions?

Unable to run distributed inference on ray with tensor parallel size > 1

Hello. It seems like a question related to the above issue, so I'm inquiring about it together below. Currently, we are conducting inference using Mistral 7B model with V100 16GB...