Anslin
Anslin
Thanks for the response @Karry11 @danielhanchen, I tried merged_16bit, and it required more VRAM, but I only have 16 GB VRAM, is there any other way to run the model...
Thanks for the consideration @danielhanchen
@danielhanchen no issues, thanks for the update... ✨
@tom-doerr I'm trying for inferencing, do you have any parallel inferencing code?
Thank you @tom-doerr. I have tried `TypedPredictor` and `TypedChainOfThought` as well, but I'm facing an error, I have attached the code snippet and the error message. I'm using AsyncIO for...
@tom-doerr, Sorry for the inconvenience, there is the updated code with parallel processing, as of now I've not developed the FastAPI part. Code: ``` async def process_request(es: Elasticsearch, prompt: str):...
@tom-doerr I'm not aware of the process-based worker model. Do you have any sample code for the process-based worker model, could you please share it?
Thank you @tom-doerr, I understand, but I had the plan to manage OpenAI calls in a centralized place when we tried more requests and more data we may face the...
No @tom-doerr, multiple instances, and single instances trigger the exception based on the count and size of the request, if we handle all requests in a single instance it could...