onyx icon indicating copy to clipboard operation
onyx copied to clipboard

Add Support for text files with other extensions .org (org mode) or .md (markdown)

Open nausher opened this issue 9 months ago • 3 comments

I have quite a few notes that are created in Emacs Org-mode or Obsidian. These are markdown or org-mode files which have a .org or .md extension. These are text files with a different extension.

I uploaded these files to Danswer and they were 'indexed' but I see that all my search queries do not pull up any information from these files.

Can support be added for text files with non-'txt' extension.

nausher avatar May 02 '24 07:05 nausher

I believe the change for this could be as simple as addding ".org" to this line in backend/danswer/connectors/file/utils.py _VALID_FILE_EXTENSIONS = [".txt", ".zip", ".pdf", ".md", ".mdx"] changed to - _VALID_FILE_EXTENSIONS = [".txt", ".zip", ".pdf", ".md", ".mdx",".org"]

https://github.com/danswer-ai/danswer/blob/143b50c519d916c81e072d8ca406bf0d87750761/backend/danswer/connectors/file/utils.py#L11

nausher avatar May 02 '24 22:05 nausher

Hmm... I have ingested .md files without issue. I think it might not read them as formatted files, mind you, but it does seem to accept them and they are searchable for me as text files, at least.

zarlor avatar May 07 '24 21:05 zarlor

@zarlor - the issue seems to be now limited to ".org" files. The code has a filter to accept files with the extesnion ".md" & ".mdx"

nausher avatar May 08 '24 00:05 nausher