FlagEmbedding BGEM3FlagModel显卡调用问题

import os os.environ["CUDA_VISIBLE_DEVICES"]="1,5" from FlagEmbedding import BGEM3FlagModel model = BGEM3FlagModel('BAAI_bge-m3', use_fp16=True) 我再使用上述代码调用显卡时发现如果我调用两张显卡，那么程序能正常运行，但是显存占用差距特别大，第一张卡显存占用可以超过40G，第二张卡占用只有4G左右；一旦我设置的显卡数量超过2张卡，就会报错，报错信息是：

Traceback (most recent call last):
  File "inference_m3.py", line 44, in <module>
    score = model.compute_score(batch, max_passage_length=1024, weights_for_different_modes=[0.4, 0.2, 0.4])
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/FlagEmbedding/bge_m3.py", line 235, in compute_score
    queries_output = self.model(queries_inputs, return_dense=True, return_sparse=True, return_colbert=True,
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/FlagEmbedding/BGE_M3/modeling.py", line 350, in forward
    last_hidden_state = self.model(**text_input, return_dict=True).last_hidden_state
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/data_parallel.py", line 184, in forward
    replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/data_parallel.py", line 189, in replicate
    return replicate(module, device_ids, not torch.is_grad_enabled())
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/replicate.py", line 110, in replicate
    param_copies = _broadcast_coalesced_reshape(params, devices, detach)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/replicate.py", line 79, in _broadcast_coalesced_reshape
    return comm.broadcast_coalesced(tensors, devices)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/comm.py", line 57, in broadcast_coalesced
    return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: NCCL Error 2: unhandled system error (run with NCCL_DEBUG=INFO for details)

请问这该怎么解决呢？FlagEmbedding调用多显卡有什么正确的调用方法吗？

Feb 05 '24 07:02 Anthony-Sun-S

您好，compute_score时用的dataparall，所有显卡的结果都会返回到第一张卡上，由于colbert和sparse vector空间占用比较大，会导致第一张卡显存占用很大，尤其是在数据比较长的时候。目前实现的compute_score只是个样咧，实际使用还需优化。如果需要compute_score来进行重排，可以尝试使用bge-reranker

Feb 05 '24 14:02 staoxiao

我是在测试检索阶段还没有进到重排，发现使用sentence-transformer的情况下m3比v1.5large表现好很多，换成flagembedding的compute_score时指标还能再提升；之前也有试过直接使用rerank，但是还是有一定差距的

Feb 06 '24 02:02 Anthony-Sun-S

我是在测试检索阶段还没有进到重排，发现使用sentence-transformer的情况下m3比v1.5large表现好很多，换成flagembedding的compute_score时指标还能再提升；之前也有试过直接使用rerank，但是还是有一定差距的

感谢反馈！compute_score主要是使用了colbert进行计算，我们后面会考虑对其进行压缩，减小显存使用。

Feb 06 '24 07:02 staoxiao

好的，期待您的更新，辛苦了！

Feb 06 '24 09:02 Anthony-Sun-S

FlagEmbedding FlagEmbedding copied to clipboard

BGEM3FlagModel显卡调用问题

FlagEmbedding
FlagEmbedding copied to clipboard