amazon-dsstne Allocate failed out of memory when predicting user's click data

We use some data, include user's id and corresponding click article, to train the recommended system.

Config we used is ./samples/movielens/config.json
First, one day's data can be used to train and test the model successfully. Finally, the model size is 4.1GB
We did another attempt to train the model with three days of data. Training is successful and the model size is smaller than before, its size is only 3.5GB, but when predicting data, the error occurred: GpuBuffer:: Allocate failed out of memory, predict: ../engine/GpuTypes.h:463: void GpuBuffer<T>::Allocate() [with T = float]: Assertion '0' failed. What should we do to troubleshot this error and solve this problem?

Jun 14 '17 01:06 qubingxin

What GPU are you using?

On Jun 13, 2017 6:59 PM, "qubingxin" [email protected] wrote:

We use some data, include user's id and corresponding click article, to train the recommended system.

Config we used is ./samples/movielens/config.json

First, one day's data can be used to train and test the model successfully. Finally, the model size is 4.1GB

We did another attempt to train the model with three days of data. Training is successful and the model size is smaller than before, its size is only 3.5GB, but when predicting data, the error occurred: GpuBuffer:: Allocate failed out of memory, predict: ../engine/GpuTypes.h:463: void GpuBuffer::Allocate() [with T = float]: Assertion '0' failed. What should we do to troubleshot this error and solve this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/amzn/amazon-dsstne/issues/132, or mute the thread https://github.com/notifications/unsubscribe-auth/ARNK9vE8Ollhg146wskxlxu1WeLzL5n1ks5sDz6agaJpZM4N5StZ .

Jun 14 '17 02:06 scottlegrand

Tesla M40 24GB

Jun 14 '17 03:06 qubingxin

Cut your batch size in half and see if that fixes this. I am somewhat shooting in the dark here.

On Tue, Jun 13, 2017 at 8:30 PM, qubingxin [email protected] wrote:

Tesla M40 24GB

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/amzn/amazon-dsstne/issues/132#issuecomment-308309879, or mute the thread https://github.com/notifications/unsubscribe-auth/ARNK9vnNznOdsX_kHB_wUWlXaBoDPC0Sks5sD1PggaJpZM4N5StZ .

Jun 14 '17 04:06 scottlegrand

Cut the batch size from 256 to 1, the same error occurred.

Jun 14 '17 05:06 qubingxin

Weird, could you rebuild uncommenting out //#define MEMTRACKING in GpuTypes.h and send the output? The other option is to run across multiple GPUs with MPI if you have them.

Aug 23 '17 16:08 scottlegrand

Ping?

Sep 21 '17 01:09 slegrandA9

amazon-dsstne amazon-dsstne copied to clipboard

Allocate failed out of memory when predicting user's click data

amazon-dsstne
amazon-dsstne copied to clipboard