inference
inference copied to clipboard
QUESTION: How to load Yi-200k 34b with 4 A10 cards
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True
and pass a custom
device_map
to from_pretrained
. Check
Hi, @tfal-yan . Please try n_gpu=4
option when launching Yi-200k 34b
.
我是通过ui加载的,ui有办法设置吗?还是说必须用cli加载?能配置自动设别gpu数量吗,多谢
我是通过ui加载的,ui有办法设置吗?还是说必须用cli加载?
Launching a model on the UI via the n_gpu
parameter is currently not supported. This must be done by code.
能配置自动设别gpu数量吗,只有空闲的资源够,就使用,多谢
能配置自动设别gpu数量吗,只有空闲的资源够,就使用,多谢
This feature is not currently available and will be supported in the future.
另外,再请问下,如果GPU编号不是连续的,比如有5个GPU,0-4,3被其他模型占用了,0,1,2和4共3个,用n_gpu=4也可以吗?
另外,再请问下,如果GPU编号不是连续的,比如有5个GPU,0-4,3被其他模型占用了,0,1,2和4共3个,用n_gpu=4也可以吗?
n_gpu=4 最好保证有四个闲置的 GPU
另外,再请问下,如果GPU编号不是连续的,比如有5个GPU,0-4,3被其他模型占用了,0,1,2和4共3个,用n_gpu=4也可以吗?
这种情况最好在启动 xinference 之前,通过 CUDA_VISIBLE_DEVICES
控制哪些 GPU 是 xinference 可以全权使用的。
目前,xinference 的默认行为是所有可见的 GPU 都是它可以使用的,不会考虑 GPU 是否有其他负载。
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 5 days since being marked as stale.