onyx
onyx copied to clipboard
Add Support for text files with other extensions .org (org mode) or .md (markdown)
I have quite a few notes that are created in Emacs Org-mode or Obsidian. These are markdown or org-mode files which have a .org or .md extension. These are text files with a different extension.
I uploaded these files to Danswer and they were 'indexed' but I see that all my search queries do not pull up any information from these files.
Can support be added for text files with non-'txt' extension.
I believe the change for this could be as simple as addding ".org" to this line in backend/danswer/connectors/file/utils.py
_VALID_FILE_EXTENSIONS = [".txt", ".zip", ".pdf", ".md", ".mdx"]
changed to -
_VALID_FILE_EXTENSIONS = [".txt", ".zip", ".pdf", ".md", ".mdx",".org"]
https://github.com/danswer-ai/danswer/blob/143b50c519d916c81e072d8ca406bf0d87750761/backend/danswer/connectors/file/utils.py#L11
Hmm... I have ingested .md files without issue. I think it might not read them as formatted files, mind you, but it does seem to accept them and they are searchable for me as text files, at least.
@zarlor - the issue seems to be now limited to ".org" files. The code has a filter to accept files with the extesnion ".md" & ".mdx"