FlagEmbedding icon indicating copy to clipboard operation
FlagEmbedding copied to clipboard

进程退出 cleanup 时抛出异常

Open patricksuo opened this issue 11 months ago • 9 comments

环境:

  • Mac silicon
  • Python 3.12.8
  • FlagEmbedding==1.3.3
:Exception ignored in: <function AbsEmbedder.__del__ at 0x11bb863e0>
Traceback (most recent call last):
  File "/xxx/.venv/lib/python3.12/site-packages/FlagEmbedding/abc/inference/AbsEmbedder.py", line 270, in __del__
  File "/xxx/.venv/lib/python3.12/site-packages/FlagEmbedding/abc/inference/AbsEmbedder.py", line 89, in stop_self_pool
TypeError: 'NoneType' object is not callable

patricksuo avatar Feb 07 '25 12:02 patricksuo

del() can be executed during interpreter shutdown. As a consequence, the global variables it needs to access (including other modules) may already have been deleted or set to None. Python guarantees that globals whose name begins with a single underscore are deleted from their module before other globals are deleted; if no other references to such globals exist, this may help in assuring that imported modules are still available at the time when the del() method is called.

https://github.com/FlagOpen/FlagEmbedding/blob/0306f566aee5356f8a53cbcffacbd6cc0d83177a/FlagEmbedding/abc/inference/AbsEmbedder.py#L99-L108

在进程退出时,这段 cleanup 代码会被执行,但这时候 gc.collect 已经被设置为 None 了,所以会抛出异常

patricksuo avatar Feb 07 '25 13:02 patricksuo

你好,@patricksuo,谢谢你提出这个问题!我们的测试中还没有遇到过这类问题,可以麻烦你给一段代码帮助我们复现这个问题吗?谢谢!

hanhainebula avatar Feb 13 '25 14:02 hanhainebula

@hanhainebula 我本地是这样重现的: 1)加载模型 2)把模型放到全局变量 3) 调用 encode query 3)进程退出

from langchain_text_splitters import CharacterTextSplitter
from FlagEmbedding import BGEM3FlagModel


model = BGEM3FlagModel('BAAI/bge-m3', return_dense=True, return_sparse=True)


def search(query:str) :
    query_vector = model.encode_queries([query], batch_size=8, return_dense=True, return_sparse=True, convert_to_numpy=True)['dense_vecs']
    return query_vector
   
search("xxx")

进程退出时 gc.collect 被置为 None

patricksuo avatar Feb 14 '25 12:02 patricksuo

今天再试了一下,比较奇怪,可能不是一个可以确定复现的问题。

patricksuo avatar Feb 15 '25 07:02 patricksuo

Regarding this issue, I've edited the AbsEmbedder.py directly and the error is resolved. Could you please try this?

    def stop_self_pool(self):

        if self.pool is not None:
            self.stop_multi_process_pool(self.pool)
            self.pool = None
        try:
            self.model.to('cpu')
            torch.cuda.empty_cache()
        except:
            pass
        # gc.collect()
        if gc is not None and callable(gc.collect):
            gc.collect()

Erena-Kim avatar Mar 13 '25 04:03 Erena-Kim

@Erena-Kim Thank you for your response and for sharing the workaround!

Yes, your workaround does mitigate the problem.

That said, I still lean toward the view that the library’s use of del for resource cleanup might be a design misstep.

According to Python’s documentation, del can be executed during interpreter shutdown, but by then, global variables (including other modules) may already have been deleted or set to None. While Python guarantees that globals starting with a single underscore are deleted before others, this doesn’t fully ensure that dependent modules are still available when del is called.

This leads to the exceptions we’re seeing during process exit. I think a more robust approach might be to explicitly manage resource cleanup rather than relying on the magic behavior of del.

patricksuo avatar Mar 14 '25 11:03 patricksuo

Hello, @patricksuo. Thank you for pointing out this issue. I just fix this bug by using the modification from @Erena-Kim (PR #1407). If there are any other questions, feel free to open an issue :)

hanhainebula avatar Mar 20 '25 10:03 hanhainebula

Hello, @patricksuo. Thank you for pointing out this issue. I just fix this bug by using the modification from @Erena-Kim (PR #1407). If there are any other questions, feel free to open an issue :)

你好,我按照上面的修改依旧没有解决cleanup的问题,我尝试修改:“ def stop_multi_process_pool(pool: Dict[Literal["input", "output", "processes"], Any]) -> None: """ Stops all processes started with start_multi_process_pool.

    Args:
        pool (Dict[str, object]): A dictionary containing the input queue, output queue, and process list.

    Returns:
        None
    """
    for p in pool["processes"]:
        if p.is_alive() and hasattr(signal, "SIGTERM"):
            p.terminate()
        # p.terminate()”解决了问题

yuaf369 avatar Mar 20 '25 14:03 yuaf369

你好,我按照上面的修改依旧没有解决cleanup的问题

你好,@yuaf369。请问可以提供一份样例代码来帮助复现报错吗?

hanhainebula avatar Mar 20 '25 14:03 hanhainebula