[Question]: Is there a way to configure how particular extension should be parsed?
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
Describe your problem
At current moment supported file extensions out of the box are listed file_utils.py and extension must be handled by a parser. Is there a batter way to extend list of parsable file extensions other than manually write it inside the python code and rebuild the docker image? I think this should be configurable rather then exhaustive list of every file extension possible. Examples of missing extensions: .bat (windows cmd), .ps1 (powershell), .pas (pascal/delphi), hpp (c++ header), .el (emacs lisp),
Hi @MakaAlb! I'm Dosu and I’m helping the ragflow team.
Currently, RAGFlow does not provide a way to configure how particular file extensions are parsed without modifying the Python code and rebuilding the Docker image. The supported file extensions are hardcoded in the filename_type function in file_utils.py, which uses regular expressions to map extensions to file types. There is no mechanism to dynamically add or change supported extensions through configuration files or environment variables at runtime—any changes require editing the code directly and redeploying the application source.
This limitation has been noted by other users as well, with suggestions that file parsing methods should ideally be selectable at upload time rather than being fixed in the knowledge base or code, but this is not currently implemented source.
At this time, the only way to extend the list of parsable file extensions is to manually update the code and rebuild the Docker image. There is no built-in support for making this configurable.
To reply, just mention my name @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
@MakaAlb Any use cases for these file extensions?
Our organization heavily uses Delphi (based on Pascal) and depends on powershell and bat files for automation. Since it is not feasible to cover every file extension that could be used by different organizations. I think it should be configurable.
@MakaAlb Got it. Will do some investigations to see if we can provide any solution to it.
@zephyershen also encountered this challenge, like Vue files, etc. I'm curious are you building a coding copilot upon RAGFlow for your org or just an assistant answering questions about your codebase? @MakaAlb
We will build the coding AI ourselves, and ragflow only needs to be able to parse the code and answer the questions of the codebase @ZhenhangTung
@zephyershen also encountered this challenge, like Vue files, etc. I'm curious are you building a coding copilot upon RAGFlow for your org or just an assistant answering questions about your codebase? @MakaAlb
We plan on answering questions on code and rewieving PRs in context of the codebase.
Update: will support this feature in v0.21.x.
This limit is exclusively applicable to documents and office-related tasks.
Then currently it is not R.A.G + FLOW !!
Related: https://github.com/infiniflow/ragflow/issues/718