FunASR icon indicating copy to clipboard operation
FunASR copied to clipboard

连续运行情绪识别爆显存

Open xing-shuyin opened this issue 1 month ago • 0 comments
trafficstars

Notice: In order to resolve issues more efficiently, please raise issue following the template. (注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

❓ 我想批量识别视频句子的情绪, 每个句子都很短, 但是连续运行显存会越来越大,

显存变化如下: 2.2 2.2 2.6 2.2 2.3 2.4 2.7 2.4 2.5 2.4 2.6 7.7 爆显存

Before asking:

  1. search the issues.
  2. search the docs.

能稳定维持显存

Code

` def get_audio_slice( video_path: str | Path, start_time: float, end_time: float, sample_rate: int = 44100, # 可降低采样率以加速 stereo: bool = True, # 立体声 ) -> bytes: duration = end_time - start_time

cmd = [
    "ffmpeg",
    "-ss",
    str(start_time),  # 精准定位(关键优化)
    "-i",
    video_path,
    "-t",
    str(duration),
    "-acodec",
    "pcm_s16le",  # 16-bit PCM(最快无损格式)
    "-ar",
    str(sample_rate),  # 可降低采样率
    "-ac",
    "2" if stereo else "1",  # 单声道更快
    "-f",
    "wav",
    "-y",  # 覆盖输出(无提示)
    "pipe:1",  # 内存输出
]

# 运行并捕获输出
result = subprocess.run(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    check=True,
)
return result.stdout

def emotion_video( video_dir="/home/c/Videos/video4", data_suffix=".data2.json", ): video_dir_path = Path(video_dir) if not video_dir_path.exists() or not video_dir_path.is_dir(): logger.error(f"❌ 视频目录不存在或不是目录: {video_dir}") return data_files = list(video_dir_path.rglob(f"*{data_suffix}"))

emotion_model = AutoModel(
    model=r"media\model\iic\emotion2vec_plus_large",
    device="cuda",
    disable_update=True,
    disable_pbar=True,
)
for data_file in data_files:
    with open(data_file, "r", encoding="utf-8") as f:
        data = json.load(f)
    logger.info(f"🎬 正在处理: {data_file}")
    for i in data["sentence"]:
        if i.get("emotion", None):
            continue
        logger.info(f"🎬 正在处理文本: {i["text"]} {i['start']} {i['end']}")

        video_path = str(data_file).replace(data_suffix, ".mp4")
        if not os.path.exists(video_path):
            logger.error(f"❌ 视频文件不存在: {video_path}")
            continue
        audio_slice = get_audio_slice(
            video_path,
            i["start"],
            i["end"],
        )
        res = emotion_model.generate(
            audio_slice,
            granularity="utterance",
            extract_embedding=False,
            device="cuda",
        )
        # 找到最大分数的索引
        item = res[0]
        max_score_index = item["scores"].index(max(item["scores"]))

        # 获取对应的标签
        max_score_label = item["labels"][max_score_index]
        i["emotion"] = max_score_label
        time.sleep(2)
    with open(data_file, "w", encoding="utf-8") as f:
        json.dump(data, f, ensure_ascii=False, indent=2)

`

What have you tried?

`(vam) PS C:\project\vam\back> uv run init.py C:\project\vam\back\api\tv.py:445: SyntaxWarning: invalid escape sequence '{' "question": [{"name": "问题1", "range": [[1,5],[20, 30]]}], C:\project\vam\back\api\tv.py:1037: SyntaxWarning: invalid escape sequence '{' "question": [{"name": "问题1", "range": [[1,5],[20, 30]]}], C:\project\vam\back

jionlp - 微信公众号: JioNLP Github: https://github.com/dongrixinyu/JioNLP.

C:\project\vam.venv\Lib\site-packages\jieba_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources funasr version: 1.2.7. WARNING:root:trust_remote_code: False Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.0.0.weight, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.0.0.bias, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.1.0.weight, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.1.0.bias, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.2.0.weight, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.2.0.bias, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.3.0.weight, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.3.0.bias, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.proj.weight, media\model\iic\emotion2vec_plus_large\model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.proj.bias, media\model\iic\emotion2vec_plus_large\model.pt 2025-10-15 22:00:51.270 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video10\尼山滑雪场.data2.json 2025-10-15 22:00:51.272 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video10\雪具大厅.data2.json 2025-10-15 22:00:51.276 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\2025雅思备考.data2.json 2025-10-15 22:00:51.289 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\《现在就出发》综艺片段.data2.json 2025-10-15 22:00:51.299 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\【掐丝珐琅画】百合摆件教程.data2.json 2025-10-15 22:00:51.303 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\京剧《武家坡》教学视频.data2.json 2025-10-15 22:00:51.313 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\初会_高效备考.data2.json 2025-10-15 22:00:51.321 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\南昌&景德镇&上饶旅游vlog.data2.json 2025-10-15 22:00:51.332 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\反刍动物饲料原料的选择与使用.data2.json 2025-10-15 22:00:51.338 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\可乐鸡翅教程.data2.json 2025-10-15 22:00:51.344 | INFO | api.tv:emotion_video:1303 - 🎬 正在处理: C:\project\video\video4\周城旅游日记.data2.json 2025-10-15 22:00:51.344 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 回头一步, 13.04 15.47 inference----------------- device cuda:0 cuda 2025-10-15 22:00:55.015 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 清辰一扑, 15.47 18.73 inference----------------- device cuda:0 cuda 2025-10-15 22:00:57.361 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 彩虹, 18.81 19.615 inference----------------- device cuda:0 cuda 2025-10-15 22:00:59.683 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 错过相拥时间, 22.73 26.66 inference----------------- device cuda:0 cuda 2025-10-15 22:01:02.189 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 几处有微风去处, 28.01 31.61 inference----------------- device cuda:0 cuda 2025-10-15 22:01:04.569 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 藏着心动。 31.65 33.225 inference----------------- device cuda:0 cuda 2025-10-15 22:01:06.987 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 天水的梦躲着渔火, 36.01 42.14 inference----------------- device cuda:0 cuda 2025-10-15 22:01:09.614 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 阳光些渐行渐远的时光才高。 42.22 49.53 inference----------------- device cuda:0 cuda 2025-10-15 22:01:12.131 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 想躺在你 you first ball 路的谁的路上, 53.07 63.545 inference----------------- device cuda:0 cuda 2025-10-15 22:01:14.747 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 还是热爱伤开缘分还风雨浪相伴, 64.43 74.11 inference----------------- device cuda:0 cuda 2025-10-15 22:01:17.329 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 或许也能是于花开是丰盛的欢喜人物, 75.44 97.26 inference----------------- device cuda:0 cuda 2025-10-15 22:01:20.217 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 唤星繁星美好的光阴。 97.42 103.605 inference----------------- device cuda:0 cuda 2025-10-15 22:01:22.671 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 我着雨晃样方的街渐行渐远, 106.68 112.335 inference----------------- device cuda:0 cuda 2025-10-15 22:01:25.185 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 时光在 thank you。 113.23 121.08 inference----------------- device cuda:0 cuda 2025-10-15 22:01:27.785 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: first, 121.26 121.505 inference----------------- device cuda:0 cuda 2025-10-15 22:01:30.045 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 特别好看, 122.82 123.765 inference----------------- device cuda:0 cuda 2025-10-15 22:01:32.418 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 我的喜待留吧。 124.6 126.85 inference----------------- device cuda:0 cuda 2025-10-15 22:01:34.997 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 多少电视如爱们在翻抱紧的 shpray, 128.93 140.055 inference----------------- device cuda:0 cuda 2025-10-15 22:01:37.623 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 或是跟谁细数美好的一次初见期待着台下段情节。 141.25 154.705 inference----------------- device cuda:0 cuda 2025-10-15 22:01:40.252 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 想在你好的温习的谁可白路上掩是热爱。 158.23 170.625 inference----------------- device cuda:0 cuda 2025-10-15 22:01:42.844 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 嗯嗯嗯嗯嗯, 197.38 224.91 inference----------------- device cuda:0 cuda 2025-10-15 22:01:45.983 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 那可以天嗯嗯你啦, 239.64 265.78 inference----------------- device cuda:0 cuda 2025-10-15 22:01:48.986 | INFO | api.tv:emotion_video:1307 - 🎬 正在处理文本: 我嗯这个的嗯有嗯不果嗯嗯知的。 270.87 379.97 inference----------------- Traceback (most recent call last): File "C:\project\vam\back\init.py", line 263, in emotion_video(r"C:\project\video") File "C:\project\vam\back\api\tv.py", line 1318, in emotion_video res = emotion_model.generate( ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\funasr\auto\auto_model.py", line 307, in generate return self.inference( ^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\funasr\auto\auto_model.py", line 364, in inference res = model.inference(**batch, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\funasr\models\emotion2vec\model.py", line 236, in inference feats = self.extract_features(source, padding_mask=None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\funasr\models\emotion2vec\model.py", line 183, in extract_features res = self.forward( ^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\funasr\models\emotion2vec\model.py", line 123, in forward extractor_out = feature_extractor( ^^^^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\funasr\models\emotion2vec\base.py", line 294, in forward return self.contextualized_features( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\funasr\models\emotion2vec\base.py", line 233, in contextualized_features alibi_bias = self.get_alibi_bias( ^^^^^^^^^^^^^^^^^^^^ File "C:\project\vam.venv\Lib\site-packages\funasr\models\emotion2vec\base.py", line 572, in get_alibi_bias .to(dtype=dtype, device=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.09 GiB. GPU 0 has a total capacity of 8.00 GiB of which 3.60 GiB is free. Of the allocated memory 1.21 GiB is allocated by PyTorch, and 1.23 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) (vam) PS C:\project\vam\back> `

What's your environment?

  • OS (e.g., Linux):win11
  • FunASR Version (e.g., 1.0.0):1.2.7
  • ModelScope Version (e.g., 1.11.0):1.30.0
  • PyTorch Version (e.g., 2.0.0):2.8.0
  • How you installed funasr (pip, source):uv
  • Python version:3.12.11
  • GPU (e.g., V100M32)4060
  • CUDA/cuDNN version (e.g., cuda11.7):无
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
  • Any other relevant information:

xing-shuyin avatar Oct 15 '25 14:10 xing-shuyin