Anslin

Results 9 comments of Anslin

Thanks for the response @Karry11 @danielhanchen, I tried merged_16bit, and it required more VRAM, but I only have 16 GB VRAM, is there any other way to run the model...

Thanks for the consideration @danielhanchen

@danielhanchen no issues, thanks for the update... ✨

@tom-doerr I'm trying for inferencing, do you have any parallel inferencing code?

Thank you @tom-doerr. I have tried `TypedPredictor` and `TypedChainOfThought` as well, but I'm facing an error, I have attached the code snippet and the error message. I'm using AsyncIO for...

@tom-doerr, Sorry for the inconvenience, there is the updated code with parallel processing, as of now I've not developed the FastAPI part. Code: ``` async def process_request(es: Elasticsearch, prompt: str):...

@tom-doerr I'm not aware of the process-based worker model. Do you have any sample code for the process-based worker model, could you please share it?

Thank you @tom-doerr, I understand, but I had the plan to manage OpenAI calls in a centralized place when we tried more requests and more data we may face the...

No @tom-doerr, multiple instances, and single instances trigger the exception based on the count and size of the request, if we handle all requests in a single instance it could...