Mayank Mishra

Results 187 comments of Mayank Mishra

Hmm, @sgugger can you tell which library are you guys using for the inference API, if not flask?

Thanks, I can confirm that this issue is not occuring with Starlette and FastAPI (built on top of Starlette). Not sure why this happens with Flask. Closing this ❤️

@muellerzr @sgugger Nevermind, this is still happening even with this minimal working example: As you can see I am not even storing any variable, only the model and tokenizer. This...

If I call a torch.cuda.empty_cache() after this, then this happens:

Could it be that this is expected behaviour? @ydshieh I am seeing a memory blowup with gpt2 also after replacing bigscience/bloom to gpt2 I am not sure if this is...

With pdb, I am seeing a blowup too. But my guess would be this is not right way to measure memory since, I see something similar with GPT2 as well....

> You may also be able to get a bit more by doing garbage collection as well, after deleting the model in python > > E.g.: > > ```python >...

Yes, thanks I think Ill try to see the memory usage over time by running in a for loop or something. To see how this changes memory (both in server...

This is not an issue anymore. Thanks for helping guys. Closing this :)