[Question]: How to adjust the output description of picture parser?
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
Here is a log of my picture parsing task:
2025-05-22 17:13:21,367 INFO 2570836 handle_task begin for task {"id": "0039916a36ed11f0933b8dfa7b97ec01", "doc_id": "b442508c36e511f0933b8dfa7b97ec01", "from_page": 0, "to_page": 100000000, "retry_count": 0, "kb_id": "026bed2c2b1011f0b66a0242ac150006", "parser_id": "picture", "parser_config": {"pages": [[1, 1000000]], "field_map": {"2025_nian_du_sheng_ji_ji_guan_kao_lu_zhi_wei_jian_jie_biao_tks": "2025\u5e74\u5ea6\u7701\u7ea7\u673a\u5173\u8003\u5f55\u804c\u4f4d\u7b80\u4ecb\u8868", "unnamed: 1_long": "Unnamed: 1", "unnamed: 2_tks": "Unnamed: 2", "unnamed: 3_long": "Unnamed: 3", "unnamed: 4_tks": "Unnamed: 4", "unnamed: 5_long": "Unnamed: 5", "unnamed: 6_tks": "Unnamed: 6", "unnamed: 7_tks": "Unnamed: 7", "unnamed: 8_tks": "Unnamed: 8", "unnamed: 9_long": "Unnamed: 9", "unnamed: 10_long": "Unnamed: 10", "unnamed: 11_tks": "Unnamed: 11", "unnamed: 12_tks": "Unnamed: 12", "unnamed: 13_tks": "Unnamed: 13", "li_shu_ _guan_xi_tks": "\u96b6\u5c5e \u5173\u7cfb", "di_qu_ _dai_ma_long": "\u5730\u533a \u4ee3\u7801", "di_qu_ _ming_cheng_tks": "\u5730\u533a \u540d\u79f0", "dan_wei_dai_ma_long": "\u5355\u4f4d\u4ee3\u7801", "dan_wei_ming_cheng_tks": "\u5355\u4f4d\u540d\u79f0", "zhi_wei_dai_ma_long": "\u804c\u4f4d\u4ee3\u7801", "zhi_wei_ming_cheng_tks": "\u804c\u4f4d\u540d\u79f0", "zhi_wei_jian_jie_tks": "\u804c\u4f4d\u7b80\u4ecb", "kao_shi_lei_bie_tks": "\u8003\u8bd5\u7c7b\u522b", "kai_kao_bi_li_long": "\u5f00\u8003\u6bd4\u4f8b", "zhao_kao_ren_shu_long": "\u62db\u8003\u4eba\u6570", "xue_\u3000_li_tks": "\u5b66\u3000\u5386", "zhuan_\u3000_ye_tks": "\u4e13\u3000\u4e1a", "qi_\u3000_ta_tks": "\u5176\u3000\u5b83"}}, "name": "image.png", "type": "visual", "location": "image.png", "size": 10596985, "tenant_id": "b92086da2b0c11f0bfe90242ac150006", "language": "English", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "kb_parser_config": {"pages": [[1, 1000000]], "field_map": {"2025_nian_du_sheng_ji_ji_guan_kao_lu_zhi_wei_jian_jie_biao_tks": "2025\u5e74\u5ea6\u7701\u7ea7\u673a\u5173\u8003\u5f55\u804c\u4f4d\u7b80\u4ecb\u8868", "unnamed: 1_long": "Unnamed: 1", "unnamed: 2_tks": "Unnamed: 2", "unnamed: 3_long": "Unnamed: 3", "unnamed: 4_tks": "Unnamed: 4", "unnamed: 5_long": "Unnamed: 5", "unnamed: 6_tks": "Unnamed: 6", "unnamed: 7_tks": "Unnamed: 7", "unnamed: 8_tks": "Unnamed: 8", "unnamed: 9_long": "Unnamed: 9", "unnamed: 10_long": "Unnamed: 10", "unnamed: 11_tks": "Unnamed: 11", "unnamed: 12_tks": "Unnamed: 12", "unnamed: 13_tks": "Unnamed: 13", "li_shu_ _guan_xi_tks": "\u96b6\u5c5e \u5173\u7cfb", "di_qu_ _dai_ma_long": "\u5730\u533a \u4ee3\u7801", "di_qu_ _ming_cheng_tks": "\u5730\u533a \u540d\u79f0", "dan_wei_dai_ma_long": "\u5355\u4f4d\u4ee3\u7801", "dan_wei_ming_cheng_tks": "\u5355\u4f4d\u540d\u79f0", "zhi_wei_dai_ma_long": "\u804c\u4f4d\u4ee3\u7801", "zhi_wei_ming_cheng_tks": "\u804c\u4f4d\u540d\u79f0", "zhi_wei_jian_jie_tks": "\u804c\u4f4d\u7b80\u4ecb", "kao_shi_lei_bie_tks": "\u8003\u8bd5\u7c7b\u522b", "kai_kao_bi_li_long": "\u5f00\u8003\u6bd4\u4f8b", "zhao_kao_ren_shu_long": "\u62db\u8003\u4eba\u6570", "xue_\u3000_li_tks": "\u5b66\u3000\u5386", "zhuan_\u3000_ye_tks": "\u4e13\u3000\u4e1a", "qi_\u3000_ta_tks": "\u5176\u3000\u5b83"}}, "img2txt_id": "qwen-vl-plus@Tongyi-Qianwen", "asr_id": "", "llm_id": "qwen-max@Tongyi-Qianwen", "update_time": 1747905199217, "task_type": ""}
I notice that there's a :
"language": "English"
in the log.
I wonder whether this will affect the language choice of vision LLMs if I need them to describe the picture? (Now the descriptions are written in english) How can I set this option? And also, I want to modify the prompt of picture parser, where can I find the prompt, I see some in rag/prompts.py, but I'm not sure which one works.
To adjust the output description of the picture parser in RAGFlow, here’s what you need to know:
1. Language Setting and Vision LLM Output
The "language" field in your task log (e.g., "language": "English") does affect how the picture parser chooses and interacts with vision LLMs. If the language is set to English, the parser logic will use this to decide whether to tokenize the document or use a vision LLM to generate the description. This means the output description language will match the language you set in the configuration—so if you want descriptions in another language, you should set this field accordingly in your parser configuration or task setup.
Reference
2. How to Set the Language Option
- The available language options are defined in files like
web/src/pages/flow/constant.tsxandweb/src/pages/agent/constant.tsx. You can select from these when configuring your task or through the UI if available. - To ensure your model is recognized as a vision-language model (for image-to-text tasks), check the
conf/llm_factories.jsonfile. The model you want to use should have"model_type": "image2text"in its configuration.
Reference
3. Modifying the Output Prompt
- The prompt used for generating image descriptions is defined in the code. For figure/image parsing, the
VisionFigureParserclass uses thevision_llm_figure_describe_promptfromrag/prompts.py. This is the prompt you should modify if you want to change how the vision LLM describes images. - You can find and edit this prompt in
rag/prompts.py. The parser code imports and uses it directly:
and passes it to the vision LLM when generating descriptions.from rag.prompts import vision_llm_figure_describe_prompt
Reference
Summary
- The
"language"field controls the output language of the vision LLM. - Set the language in your parser config or task setup, and ensure your model is configured as
"image2text"inconf/llm_factories.json. - To change the description style/content, edit the
vision_llm_figure_describe_promptinrag/prompts.py.
If you need more specific guidance on editing the prompt or setting the language in your workflow, let me know your setup details!
To continue talking to Dosu, mention @dosu-bot.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Please refer to this
Please refer to this
Thank you for your response, according to my tests, these prompts only work on pictures in pdf. For a picture file, rag/app/picure.py should use prompts here
For my Qwen model:
class QWenCV(Base):
def __init__(self, key, model_name="qwen-vl-chat-v1", lang="Chinese", **kwargs):
import dashscope
dashscope.api_key = key
self.model_name = model_name
self.lang = lang
def prompt(self, binary):
# stupid as hell
tmp_dir = get_project_base_directory("tmp")
if not os.path.exists(tmp_dir):
os.mkdir(tmp_dir)
path = os.path.join(tmp_dir, "%s.jpg" % get_uuid())
Image.open(io.BytesIO(binary)).save(path)
return [
{
"role": "user",
"content": [
{
"image": f"file://{path}"
},
{
"text": "请用中文详细描述一下图中的内容,比如时间,地点,人物,事情,人物心情等,如果有数据请提取出数据。" if self.lang.lower() == "chinese" else
"Please describe the content of this picture, like where, when, who, what happen. If it has number data, please extract them out.",
},
],
}
]
As I haven't determined where to set the language parameter, I currently fix it to 'Chinese'.
cv_mdl = LLMBundle(tenant_id, LLMType.IMAGE2TEXT, lang="chinese")
If there is a more effective way, kindly let me know. Thank you.
api/db/services/task_service.py#L73 get_task方法获取Knowledgebase的language,而根据api/db/db_models.py#L590语言取值是获取的环境变量LANG是否包含zh_CN。
所以在构建镜像时在Dockerfile或者在entrypoint.sh里面加上环境变量就可以了,新增的知识库就会是中文的,旧的知识库可以自己在mysql里面改一下ragflow.knowledge这张表对应知识库的language字段。