Jay S comments

Repositories
Issues
Comments

Results 14 comments of


                                            Jay S

Segmentation fault when downloading models

You were right. I just found out that it was a pytorch issue. `import torch` has been causing the segfault this whole time. The based docker image I was using...

Error deploying docker on P3 instance

I think if you use Llama it might work. I was able to make it work on V100 gpus

How to solve "Model is overloaded" when sending 500 requests?

I see! thanks for the responses. It totally makes sense. Is there a way to set-up the timeout as well?

Async and Sync results in different generation

Ok, when I tried this with a custom kernel it seems that the generation is stable (even with 128 async requests). I couldn't reproduce the error. However, I tried this...