amazon-dsstne
amazon-dsstne copied to clipboard
Allocate failed out of memory when predicting user's click data
We use some data, include user's id and corresponding click article, to train the recommended system.
- Config we used is ./samples/movielens/config.json
- First, one day's data can be used to train and test the model successfully. Finally, the model size is 4.1GB
- We did another attempt to train the model with three days of data. Training is successful and the model size is smaller than before, its size is only 3.5GB, but when predicting data, the error occurred: GpuBuffer:: Allocate failed out of memory, predict: ../engine/GpuTypes.h:463: void GpuBuffer<T>::Allocate() [with T = float]: Assertion '0' failed. What should we do to troubleshot this error and solve this problem?
What GPU are you using?
On Jun 13, 2017 6:59 PM, "qubingxin" [email protected] wrote:
We use some data, include user's id and corresponding click article, to train the recommended system.
- Config we used is ./samples/movielens/config.json
- First, one day's data can be used to train and test the model successfully. Finally, the model size is 4.1GB
- We did another attempt to train the model with three days of data. Training is successful and the model size is smaller than before, its size is only 3.5GB, but when predicting data, the error occurred: GpuBuffer:: Allocate failed out of memory, predict: ../engine/GpuTypes.h:463: void GpuBuffer::Allocate() [with T = float]: Assertion '0' failed. What should we do to troubleshot this error and solve this problem?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/amzn/amazon-dsstne/issues/132, or mute the thread https://github.com/notifications/unsubscribe-auth/ARNK9vE8Ollhg146wskxlxu1WeLzL5n1ks5sDz6agaJpZM4N5StZ .
Tesla M40 24GB
Cut your batch size in half and see if that fixes this. I am somewhat shooting in the dark here.
On Tue, Jun 13, 2017 at 8:30 PM, qubingxin [email protected] wrote:
Tesla M40 24GB
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/amzn/amazon-dsstne/issues/132#issuecomment-308309879, or mute the thread https://github.com/notifications/unsubscribe-auth/ARNK9vnNznOdsX_kHB_wUWlXaBoDPC0Sks5sD1PggaJpZM4N5StZ .
Cut the batch size from 256 to 1, the same error occurred.
Weird, could you rebuild uncommenting out //#define MEMTRACKING in GpuTypes.h and send the output? The other option is to run across multiple GPUs with MPI if you have them.
Ping?