llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Support uploading more file formats

Open c00lcoder opened this issue 1 year ago • 15 comments

Hi, is there a way to customize the UI and inputs? For example, currently, the UI allows the uploading of images, but I'd want to update it to accept CSV and PDF formats as well. If shown where to make the update, I'd happily take a look and offer a pull request with the changes.

c00lcoder avatar Dec 29 '23 15:12 c00lcoder

That'd be cool to have. It's a tricky thing to contribute though, since we'd need to come to an agreement about which dependencies will need to be added in order to support file formats like PDF. It's difficult to find libraries that are up for the challenge. The upstream llama.cpp project typically does it with single header ones like STB since they're ultra portable and easily vendorable. I'd say a good place for us to start here, would be to research and suggest potential libraries we could use.

jart avatar Dec 29 '23 15:12 jart

Okay, that sounds good. There are Python libraries (my expertise) that would be my go-to, especially being able to run it locally. I'll do some more research.

c00lcoder avatar Dec 29 '23 22:12 c00lcoder

Is pandoc a feasible tool or I'm just adding noise to this conversation? https://pandoc.org/

Bests,

francisco-lafe avatar Jan 02 '24 12:01 francisco-lafe

my two cents is that while i also want this functionality, it seems out of scope.

i think it would be more powerful (and useful for llamafile users) to instead provide example integrations to tools like langchain and llamaindex, which are specializing in the ingestion/RAG/etc of many different file formats.

then llamafile stays focused on what it does best. if people want to couple that to other kinds of input, they can integrate with tools that are designed for that. to me, the magic of llamafile is the instant portability of any open model to any hardware. while the GUI is nice from an onboarding perspective, i don't think that's the place to focus attention.

for a specific example, let's say i want a coding llamafile to work w/ csv. if i read https://huggingface.co/jartine/WizardCoder-Python-34B-V1.0-llamafile, it suggests two possibilities with langchain support:

- ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. - llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.

but given how new llamafile is, i can't find any examples of actually integrating it with langchain. to me, this kind of explicit example of how to integrate llamafile into other popular and modular frameworks would be more useful than having a super long debate about what is the best way to support RAG/multimodal/file types/etc within llamafile.

rawwerks avatar Jan 03 '24 00:01 rawwerks

Getting software to run painlessly actually is one of the things that llamafile does best. It was only as recently as 2022 that we all believed using machine learning software meant you had to tame and potentially get eaten by a big scary Python or Anaconda. That all changed when llama.cpp hit the scene, which showed us it could be simpler, and all we did was turn it into a single file.

If you look at langchain's instructions for ingesting PDF files, it appears that not only do you need to figure out how to have langchain installed, you also need to install node.js, npm, and a third party pdf-parse package too. https://js.langchain.com/docs/integrations/document_loaders/file_loaders/pdf I think the work they're doing is great, since I'd imagine those are the highest quality tools for doing it.

However I don't always want the nicest thing. Sometimes I want PDF support that's 85% as good with zero effort installing things. This is pain point you especially feel on platforms like BSDs and Windows, where there hasn't been as much resources devoted to making available and/or polishing modern packaging tools.

If libraries exist that let us support other file formats, and they're able to be compiled by cosmocc, then I'm perfectly happy to consider including them. We can also afford to be less conservative than the llama.cpp project upstream. For example, I don't care how many source files a library implementation has, so long as it's able to be built locally using our toolchain.

jart avatar Jan 03 '24 02:01 jart

@jart - awesome! go get 'em.

my two cents on the best library for ingesting docs into LLMS: https://github.com/run-llama/llama_index (no idea if able to be compiled by cosmocc, that's above my paygrade.)

rawwerks avatar Jan 03 '24 21:01 rawwerks

rawwerks

It's python so no it won't work with cosmocc.
Unless using Cython is a possibility to compile with cosmocc.
But that's above my paygrade hah

francisco-lafe avatar Jan 04 '24 13:01 francisco-lafe

I don't think there is any c++ alternative right now. However, given the current libraries I believe we could use the js version of langchain instead. https://js.langchain.com/docs/get_started/introduction

Problem here is it is gonna introduce bloat on the "server" app thru the node_modules. Especially if we were to add packages to read various document types.

But with Justine's "We can also afford to be less conservative than the llama.cpp" statement then I believe this would be a viable alternative? 🤷‍♂️

Muzika avatar Jan 06 '24 07:01 Muzika

I'm sorry but what does langchain have to do with this issue?

As far as I can tell, permissively licensed c/c++ software for turning pdf into text hasn't been written yet. The closest thing I've been able to find is https://www.xpdfreader.com/about.html which is licensed GPL. In order to include it, we'd have to build it as a separate executable, bundle it inside the zip, and extract it to ~/.llamafile/ when it's needed.

jart avatar Jan 06 '24 18:01 jart

Taking a closer look, xpdf integration is blocked on jart/cosmopolitan#1065

jart avatar Jan 06 '24 18:01 jart

Needless to say, if you install xpdf on your own, then one way you can use it with llamafile for summarizing the first three pages of a PDF file using Mistral, would be the following:

llamafile -m mistral.gguf -e -p "[INST]Summarize the following text:\n$(pdftotext -f 1 -l 3 foo.pdf /dev/stdout)\n[/INST]" --silent-prompt --log-disable

jart avatar Jan 06 '24 19:01 jart

I'm sorry but what does langchain have to do with this issue?

As far as I can tell, permissively licensed c/c++ software for turning pdf into text hasn't been written yet. The closest thing I've been able to find is https://www.xpdfreader.com/about.html which is licensed GPL. In order to include it, we'd have to build it as a separate executable, bundle it inside the zip, and extract it to ~/.llamafile/ when it's needed.

Sorry I thought you guys were talking about RAG? If the plan was to just insert the contents of the doc into the prompt then I guess we have some options we can use.

But won't that be limited with the context size? The model would just forget about the document once it exceeds the max context.

Muzika avatar Jan 06 '24 22:01 Muzika

Retrieval Augmented Generation is a problem that's more suited to a startup wanting to build the next Google. It was proposed earlier that we do this and I marked it "won't fix" in #110. Here's we're simply talking about changing the upload button in the web interface to support more file formats. For example, I'd really like to see it support webp which could be trivial for us to accomplish if the ImageMagick convert command is installed, but I'd ideally prefer to vendor a C/C++ Webp library that lets us do it ourselves. The same applies to PDF and everything else. That's modestly enough scoped for a scrappy project like this one.

But won't that be limited with the context size? The model would just forget about the document once it exceeds the max context.

Isn't that a problem everyone else in the industry is facing, even OpenAI? Context windows just keep getting bigger. For example, Mistral 0.2 was recently released with a context size of 36k, which is large enough to actually be easily useful for summarizing blog posts and essays without running into errors. There's a lot of people actively researching and developing improvements such as that and all I'm doing here is making those solutions locally accessible and useful when they arrive.

jart avatar Jan 06 '24 22:01 jart

Could this be extended to source code files? Like C, C++, C#, Java, Dart, Python, etc. Or this should be a completely new issue?

I'd very much like to use llamafile as an alternative to Copilot Chat when I'm offline. I don't expect it to be as good as, but it to give me a general context idea for concepts.

FMorschel avatar Jan 16 '24 11:01 FMorschel

Could this be extended to source code files? Like C, C++, C#, Java, Dart, Python, etc. Or this should be a completely new issue?

I'd very much like to use llamafile as an alternative to Copilot Chat when I'm offline. I don't expect it to be as good as, but it to give me a general context idea for concepts.

Source codes are merely plain text saved on a text file with different file extensions.

So if it can support reading .txt files then it should by logic support reading source code files.

Muzika avatar Jan 16 '24 11:01 Muzika