Alexander Borzunov
Alexander Borzunov
Hi @raihan0824, Your GPU is not shared when you use a Petals client to run inference or fine-tuning. The GPU is only shared when you run a Petals server.
Hi @LuciferianInk, The format is not obligatory but does improve the quality of the model. We'll try moving to the official format to achieve that.
@apcameron @Eclipse-Station I agree that this feature would be useful. We'll try to find time to implement it - and pull requests are always welcome!
Hi @krrishdholakia, This repo doesn't use OpenAI API in any sense, but using a similar interface would help with interoperability with existing software. E.g., one could take an existing chatbot/text...
@krrishdholakia @ishaan-jaff Thanks for making the integration! I think @apcameron and @Eclipse-Station want an HTTP API in the OpenAI-compatible format (= one URL) that internally translates API calls to the...
Hi @apcameron @Eclipse-Station @jontstaz, Can you share a few examples of apps where OpenAI-compatible API for Petals will be helpful? We hired a part-time dev who may work on this...
For the record, there is an existing integration by Langchain devs that runs the native Petals client: https://python.langchain.com/docs/integrations/llms/petals This connects to the swarm directly (without using this API endpoint), but...
Hi @Vincent-Stragier, Sure, I'd be happy to see this happen!
Basic support is available in this PR: https://github.com/oobabooga/text-generation-webui/pull/3784
Hi @Webifi, Thanks for reporting! Please note that we can't always just truncate the new delta, since everything before the last token has already gone through the transformer and remembered...