ragflow
ragflow copied to clipboard
[Bug]:
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
....
RAGFlow image version
v0.17.0
Other environment information
I am using a docker containers enviroment deployed on an ec2 AWS virtual machine.
Actual behavior
Document parsing feature 'Resume' not available when creating a KB via API.
Possibilities when creating a new KB for the parser:
parser_config
The parser configuration of the dataset. A ParserConfig object's attributes vary based on the selected chunk_method:
chunk_method="naive":
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}.
chunk_method="qa":
{"raptor": {"user_raptor": False}}
chunk_method="manuel":
{"raptor": {"user_raptor": False}}
chunk_method="table":
None
chunk_method="paper":
{"raptor": {"user_raptor": False}}
chunk_method="book":
{"raptor": {"user_raptor": False}}
chunk_method="laws":
{"raptor": {"user_raptor": False}}
chunk_method="picture":
None
chunk_method="presentation":
{"raptor": {"user_raptor": False}}
chunk_method="one":
None
chunk_method="knowledge-graph":
{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}
chunk_method="email":
None
Returns
Success: A dataset object.
Failure: Exception
Examples
Posibilities for the parser configuration when creating a KB via User Interface:
Chunk method: Resume (and all the ones above)
So this exception raises when trying to create a kb via API using as parser configuration 'resume':
Exception: 'resume' is not in ['naive', 'manual', 'qa', 'table', 'paper', 'book', 'laws', 'presentation', 'picture', 'one', 'knowledge_graph', 'email', 'tag']
Expected behavior
It should be implemented and the KB may be creatable using resume parser config.
Steps to reproduce
dataset_config = {
"avatar": "",
"description": "",
"embedding_model": "",
"language": "English",
"permission": "me",
"chunk_method": "resume",
"parser_config": DataSet.ParserConfig(
rag=True,
res_dict={
"chunk_token_num":128,
"delimiter":"\\n!?;。;!?",
"html4excel":False,
"layout_recognize":True,
"raptor":{"user_raptor":False}
}
)
}
dataset = rag_object.create_dataset(dataset_config )
Additional information
No response