Jianyu Wang

Results 7 comments of Jianyu Wang

I have one question on my servers. It seems that when cuda:0 is almost full, it still fail to do so, by passing the parameters "CUDA_VISIBLE_DEVICES"?

Oh, I find that they are still taking the first two GPUs by ray::worker when I specify other two.

I suddenly notice that you have some updates with that file (vllm/core/scheduler.py ~ line 384~395). `if preemption_mode is None: seqs = seq_group.get_seqs(status=SequenceStatus.RUNNING) if len(seqs) == 1: preemption_mode = PreemptionMode.RECOMPUTE else:...

Is this like my process overflows my CPU memory? I am running with 8 GPUs (A5000), but 4 of them are taken for other large-scale inferences. So I am considering...

BTW, since I need to control the detailed CUDA_VISIBLE_DEVICES when running your VLLM API, how do I add another specified GPU (e.g., cuda:6) for auxiliary models in my programs, assuming...

您好,我这个是从头开始训练的。如果是已有的模型就是直接用网上的weights

这个就是看看分词有关的代码, 生成自己的vocab.txt. 我记得transformers也有对应的API的。