Sylvain Gugger
Sylvain Gugger
The same comment as above is still true.
It is on the roadmap :-)
Hi there! 1. You can do whatever you want since Accelerate will adapt to your training loop :-) 2. This is completely untested, so I can't guarantee it will work....
@jianguoz It's not a priority for now, as we have no mean of testing the solution (our request to get access to a free small TPU pod to maintain Accelerate...
No one is working on it for now, so if you want to tackle this, feel free to give it a go!
That's because PyTorch does not let you load an individual weight from a state dict because they pickle the whole thing.
Wdyt @muellerzr ?
There is no example yet, if you want to contribute one, by all means :-)
The model is used on our side on the inference API with the same options and without any memory leak, so I suspect the memory leak comes from somewhere else...
We use `starlette` on our side.