Jianyu Wang
Jianyu Wang
I have one question on my servers. It seems that when cuda:0 is almost full, it still fail to do so, by passing the parameters "CUDA_VISIBLE_DEVICES"?
Oh, I find that they are still taking the first two GPUs by ray::worker when I specify other two.
I suddenly notice that you have some updates with that file (vllm/core/scheduler.py ~ line 384~395). `if preemption_mode is None: seqs = seq_group.get_seqs(status=SequenceStatus.RUNNING) if len(seqs) == 1: preemption_mode = PreemptionMode.RECOMPUTE else:...
Is this like my process overflows my CPU memory? I am running with 8 GPUs (A5000), but 4 of them are taken for other large-scale inferences. So I am considering...
BTW, since I need to control the detailed CUDA_VISIBLE_DEVICES when running your VLLM API, how do I add another specified GPU (e.g., cuda:6) for auxiliary models in my programs, assuming...
您好,我这个是从头开始训练的。如果是已有的模型就是直接用网上的weights
这个就是看看分词有关的代码, 生成自己的vocab.txt. 我记得transformers也有对应的API的。