inference QUESTION: How to load Yi-200k 34b with 4 A10 cards

Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check

Nov 13 '23 09:11 tfal-yan

Hi, @tfal-yan . Please try n_gpu=4 option when launching Yi-200k 34b.

Nov 13 '23 09:11 ChengjieLi28

我是通过ui加载的，ui有办法设置吗？还是说必须用cli加载？能配置自动设别gpu数量吗，多谢

Nov 13 '23 09:11 tfal-yan

我是通过ui加载的，ui有办法设置吗？还是说必须用cli加载？

Launching a model on the UI via the n_gpu parameter is currently not supported. This must be done by code.

Nov 13 '23 09:11 ChengjieLi28

能配置自动设别gpu数量吗，只有空闲的资源够，就使用，多谢

Nov 13 '23 09:11 tfal-yan

能配置自动设别gpu数量吗，只有空闲的资源够，就使用，多谢

This feature is not currently available and will be supported in the future.

Nov 13 '23 09:11 ChengjieLi28

另外，再请问下，如果GPU编号不是连续的，比如有5个GPU，0-4,3被其他模型占用了，0，1，2和4共3个，用n_gpu=4也可以吗？

Nov 13 '23 12:11 tfal-yan

另外，再请问下，如果GPU编号不是连续的，比如有5个GPU，0-4,3被其他模型占用了，0，1，2和4共3个，用n_gpu=4也可以吗？

n_gpu=4 最好保证有四个闲置的 GPU

Nov 14 '23 02:11 aresnow1

另外，再请问下，如果GPU编号不是连续的，比如有5个GPU，0-4,3被其他模型占用了，0，1，2和4共3个，用n_gpu=4也可以吗？

这种情况最好在启动 xinference 之前，通过 CUDA_VISIBLE_DEVICES 控制哪些 GPU 是 xinference 可以全权使用的。

目前，xinference 的默认行为是所有可见的 GPU 都是它可以使用的，不会考虑 GPU 是否有其他负载。

Nov 14 '23 03:11 UranusSeven

This issue is stale because it has been open for 7 days with no activity.

Aug 08 '24 19:08 github-actions[bot]

This issue was closed because it has been inactive for 5 days since being marked as stale.

Aug 13 '24 19:08 github-actions[bot]

inference inference copied to clipboard

QUESTION: How to load Yi-200k 34b with 4 A10 cards

inference
inference copied to clipboard