VideoLingo icon indicating copy to clipboard operation
VideoLingo copied to clipboard

pandas.arrays_to_mgr报错数组长度必须一致

Open Brzjomo opened this issue 1 year ago • 3 comments

部分视频翻译执行到step5的pd.DataFrame({'Source': src, 'Translation': remerged}).to_excel(OUTPUT_REMERGED_FILE, index=False)时,报错arrays_to_mgr数组长度必须一致。 看了下"output/log/translation_results_remerged.xlsx"这个文件只跟翻译纯音频相关,所以目前注释step5和step6相关代码之后,就能正确结束视频翻译的任务。

Brzjomo avatar Nov 18 '24 14:11 Brzjomo

请问使用的什么 llm 呢?我在使用推荐的几家模型基本都不会出现翻译行数变少的情况,这个检查是为了最终字幕的稳定

Huanshere avatar Nov 18 '24 16:11 Huanshere

Qwen2-72B-Instruct就会出现这个问题

asu-gkg avatar Nov 19 '24 05:11 asu-gkg

😂现在默认还是推荐 claude 了,Qwen 还需要很长时间的追赶

Huanshere avatar Nov 19 '24 05:11 Huanshere

我用 openai 的 gpt-4o 也会报这个错,奇怪的是同一个yt视频,360p的没问题,720(自己加的)和1080都会报错。

2024-11-25 10:47:17.855 Uncaught app exception
Traceback (most recent call last):
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
    result = func()
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec
    exec(code, module.__dict__)
  File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 116, in <module>
    main()
  File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 112, in main
    text_processing_section()
  File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 30, in text_processing_section
    process_text()
  File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 54, in process_text
    step5_splitforsub.split_for_sub_main()
  File "/Users/dinochan/bs/ai/VideoLingo/core/step5_splitforsub.py", line 104, in split_for_sub_main
    pd.DataFrame({'Source': src_lines, 'Translation': tr_lines}).to_excel("output/log/translation_results_for_subtitles.xlsx", index=False)
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/frame.py", line 778, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
    raise ValueError("All arrays must be of the same length")

lanxichan avatar Nov 25 '24 02:11 lanxichan

哈哈哈这个和分辨率无关,可能是概率上会出错,gpt4o 可能没有返回完整响应或者漏了句子。

Huanshere avatar Nov 25 '24 16:11 Huanshere

哈哈哈这个和分辨率无关,可能是概率上会出错,gpt4o 可能没有返回完整响应或者漏了句子。

我也是这样想的,只是当时测试过程中连续稳定重现所以我才奇怪。 😂

lanxichan avatar Nov 25 '24 16:11 lanxichan

gpt_log 会记录所有响应并且重复运行的时候会从中读取历史,所以如果没有删除 log 就重新运行其实还是会报同样错误~

Huanshere avatar Nov 26 '24 04:11 Huanshere

All arrays must be of the same length 在 f86bc14 中已经能很大程度消除了,前提是别用小模型

Huanshere avatar Dec 04 '24 14:12 Huanshere