rag-experiment-accelerator
rag-experiment-accelerator copied to clipboard
Providing specific format in config is not accepted
The config.json
file has a field for data_formats
, which can have the value all
, or a specific value for the format, such as html
, docx
, or pdf
. This behavior is broken.
Expected behavior
Providing a specific value such as html
or docx
for the data_formats
results in only those formats being loaded and indexed.
Current behavior
Providing a specific value such as docx
for the data_formats
results in no files being loaded and indexed, with the following logs showing up:
Loading documents from <my_folder>/data with allowed formats d, o, c, x
Format d is not supported
Format o is not supported
Format c is not supported
Format x is not supported
The data_formats field accepts either "all"
or an array of supported values.
ex:
"data_formats": ["docx"]
This is not documented so we will add this to the backlog.
Supported values are:
-
"data_formats": ["pdf", "html", "markdown", "json", "text", "docx"]
-
"data_formats": "all"