Open-Assistant
Open-Assistant copied to clipboard
Cleanup repo for inference
It is not clear to me whether we still need the "LLaMA worker" Dockerfile.
I am also not sure if we are still using or will use the text-generation-inference worker variant as my understanding was we now use the "basic HF server" variant of the worker in production.
Maybe we could clarify these things and do some cleanup of the inference section of the repo to reflect.
This goes hand in hand with #1473 which I think needs to be prioritised now that the inference pipeline is fairly stable and not changing regularly. We have many users asking for details on how to run their own instances.