private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

CHM file support

Open NaitorStudios opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. Well, I think the ability to work with CHM files would be a great addition, which is commonly used for software documentation.

Describe the solution you'd like I expect it to be able to read, answer questions and point out what parts of it the content was found.

Describe alternatives you've considered Converting to other formats, although it might make it worse and tedious considering I have thousands of CHMs...

Additional context I guess I've explained it enough, thank you!

NaitorStudios avatar May 21 '23 23:05 NaitorStudios

This is right now not possible since langchain does not support it, see https://python.langchain.com/en/latest/reference/modules/document_loaders.html.

trivalik avatar May 22 '23 19:05 trivalik

That's unfortunate :/ CHM are internally very similar to HTML tho, it actually has HTML files inside, so perhaps it might be possible later on... Although I truly suggest that you guys consider it, even if doing it custom, since this would add a lot more power to PrivateGPT.

NaitorStudios avatar May 22 '23 21:05 NaitorStudios

After spending a day wrestling with creating a pipeline for converting chm files to something useful (without resorting to online converters, due to confidentiality), I'm very interested in this feature.

I've created a feature request over on langchain, to add support for chm files: https://github.com/langchain-ai/langchain/issues/15469

I'll have to see, whether I'll be allowed to dedicate some work hours to contribute to the project.

DarkFox avatar Jan 03 '24 10:01 DarkFox

CHM support has been added to langchain with https://github.com/langchain-ai/langchain/pull/15519

DarkFox avatar Jan 07 '24 18:01 DarkFox