pandas.arrays_to_mgr报错数组长度必须一致
部分视频翻译执行到step5的pd.DataFrame({'Source': src, 'Translation': remerged}).to_excel(OUTPUT_REMERGED_FILE, index=False)时,报错arrays_to_mgr数组长度必须一致。 看了下"output/log/translation_results_remerged.xlsx"这个文件只跟翻译纯音频相关,所以目前注释step5和step6相关代码之后,就能正确结束视频翻译的任务。
请问使用的什么 llm 呢?我在使用推荐的几家模型基本都不会出现翻译行数变少的情况,这个检查是为了最终字幕的稳定
Qwen2-72B-Instruct就会出现这个问题
😂现在默认还是推荐 claude 了,Qwen 还需要很长时间的追赶
我用 openai 的 gpt-4o 也会报这个错,奇怪的是同一个yt视频,360p的没问题,720(自己加的)和1080都会报错。
2024-11-25 10:47:17.855 Uncaught app exception
Traceback (most recent call last):
File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/exec_code.py", line 88, in exec_func_with_error_handling
result = func()
File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 590, in code_to_exec
exec(code, module.__dict__)
File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 116, in <module>
main()
File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 112, in main
text_processing_section()
File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 30, in text_processing_section
process_text()
File "/Users/dinochan/bs/ai/VideoLingo/st.py", line 54, in process_text
step5_splitforsub.split_for_sub_main()
File "/Users/dinochan/bs/ai/VideoLingo/core/step5_splitforsub.py", line 104, in split_for_sub_main
pd.DataFrame({'Source': src_lines, 'Translation': tr_lines}).to_excel("output/log/translation_results_for_subtitles.xlsx", index=False)
File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/frame.py", line 778, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 503, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 114, in arrays_to_mgr
index = _extract_index(arrays)
File "/opt/anaconda3/envs/videolingo/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 677, in _extract_index
raise ValueError("All arrays must be of the same length")
哈哈哈这个和分辨率无关,可能是概率上会出错,gpt4o 可能没有返回完整响应或者漏了句子。
哈哈哈这个和分辨率无关,可能是概率上会出错,gpt4o 可能没有返回完整响应或者漏了句子。
我也是这样想的,只是当时测试过程中连续稳定重现所以我才奇怪。 😂
gpt_log 会记录所有响应并且重复运行的时候会从中读取历史,所以如果没有删除 log 就重新运行其实还是会报同样错误~
All arrays must be of the same length 在 f86bc14 中已经能很大程度消除了,前提是别用小模型