WIP: local LLM prototype (would like feedback)
added docker-compose which launches nats + falcon7b.
it's currently pretty hacky, as nats requires async.
- Tried to use async LLM _call functions and agenerate but it gives a weird error.
- Used asyncclick instead to have async cli functions.
- ignored "INP001", # TODO dont know how to deal with services/falcon7b/nats_falcon7b.py
is part of an implicit namespace package. Add aninit.py`. ]
plus falcon7b doesnt work with current prompts. it returns pretty much the full prompt for now.
TODOs pending approach review:
- clean up nats server routing so it can be run outside of the docker network
- universal-ish transformers pipeline for different models? Prob have a few categories like bfloat16, various quantitation options. maybe find a way to rip off https://github.com/go-skynet/LocalAI
Bottlenecks:
- need a mechanism to have different prompts for different LLMs
- need to learn how to use pytest properly, and ideally have a code generation test so we can actually compare the different LLMs. Need to close the loop for generated code to run against unit tests.
Working on this now
I got basic Hugging Face Hub "working" (with bad results)
https://github.com/gorillamania/AICodeBot/commit/cb604ca970146d72d4dc836ba8a6888528a61c6c
But I think what we actually need is local LLMs, and the direction I'm going is this docker image from hugging face that runs highly optimized local models.
https://github.com/huggingface/text-generation-inference#using-a-private-or-gated-model
cool. i didnt know that you could self-host models thru huggingfacehub.
so from my research, it seems like we're gona need >20B params for it to be of any use.
I think the best way really is to close the loop by running the output code against unit tests. Then we'll be able to just run it thru all the models out there. Probably wont be straight forward due to prompt differences, but I feel dumb testing them manually one by one, knowing that theres gona be more of them released over time.
Thank you for your contribution. The code has long since diverged from this approach.