Jianyu Wang comments

Results 7 comments of


                                            Jianyu Wang

How to specify which GPU the model inference on?

I have one question on my servers. It seems that when cuda:0 is almost full, it still fail to do so, by passing the parameters "CUDA_VISIBLE_DEVICES"?

How to specify which GPU the model inference on?

Oh, I find that they are still taking the first two GPUs by ray::worker when I specify other two.

BUG: swap_size - when distributed serving very large LMs

I suddenly notice that you have some updates with that file (vllm/core/scheduler.py ~ line 384~395). `if preemption_mode is None: seqs = seq_group.get_seqs(status=SequenceStatus.RUNNING) if len(seqs) == 1: preemption_mode = PreemptionMode.RECOMPUTE else:...

BUG: swap_size - when distributed serving very large LMs

Is this like my process overflows my CPU memory? I am running with 8 GPUs (A5000), but 4 of them are taken for other large-scale inferences. So I am considering...

BUG: swap_size - when distributed serving very large LMs

BTW, since I need to control the detailed CUDA_VISIBLE_DEVICES when running your VLLM API, how do I add another specified GPU (e.g., cuda:6) for auxiliary models in my programs, assuming...

关于预训练pretrain_bert的问题

您好，我这个是从头开始训练的。如果是已有的模型就是直接用网上的weights

如果使用现成的vocab.txt改怎么修改嘞

这个就是看看分词有关的代码, 生成自己的vocab.txt. 我记得transformers也有对应的API的。