Fatal log at syncmem.h:13
I understand that this has been reported before. However, I believe that this is caused by something else this time.
I am using the scikit interface of thundersvm. I train 10 different models and select the highest scoring on validation data to predict the test data. This works fine for the first fold. However, when the script loops to the sencond fold I get the following message:
2018-12-27 11:28:00,191 INFO [default] #instances = 54123, #features = 156
2018-12-27 11:28:00,212 INFO [default] #classes = 10
2018-12-27 11:28:00,212 FATAL [default] Check failed: [error == cudaSuccess] initialization error
2018-12-27 11:28:00,212 WARNING [default] Aborting application. Reason: Fatal log at [/home/juliano/thundersvm/include/thundersvm/syncmem.h:13]
Line 13 is the CUDA_CHECK line below:
#ifdef USE_CUDA
CUDA_CHECK(cudaMallocHost(ptr, size));
#else
*ptr = malloc(size);
#endif
I believe that something in the predict method causes the cuda device memory allocation to fail the next time thundersvm tries to allocate device memory. Maybe it does not cleanup properly when using the scikit interface?
I don't believe I have an OOM error since this is a snapshot of nvidia-smi right after the predict. (nvidiasmi was run with os.system('nvidia-smi') in python. So this is precisely after the the predict.
Thu Dec 27 11:45:34 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 00000000:01:00.0 Off | N/A |
| 33% 60C P2 75W / 250W | 209MiB / 12189MiB | 65% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1070 Off | 00000000:02:00.0 Off | N/A |
| 0% 47C P2 28W / 151W | 10MiB / 8114MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 28396 C python 199MiB |
+-----------------------------------------------------------------------------+
Since there's no way of telling the scikit interface that I was done with predicting the model, it was still loaded in the device memory.
A possible fix is a method in the scikit interface that allows cleaning up thundersvm resources for that scikit model manually after predicting. I'm not sure how to do that :(
Thank you very much.
I'll try to come up with a minimal working example...
cudaMallocHost(ptr, size) will malloc the cpu memory but not the gpu memory. There maybe not enough cpu memory in your case. You can try to use the parameter max_mem_size to limit the memory you use. You can also use the del() function if you want to delete the model object (ref). It'd be better if you can provide an example.
We would like to know what applications our user community are working on. Could you tell us what problems you are addressing using thundersvm? This will help us prioritise the tasks for the upgrade in our backlog. Thanks.
First of all, thanks @GODqinbin for the tips, I'll try them as soon as I can. This machine has 32GB of RAM, and it seems like that's not the issue. However, i'll check that to make sure :)
I'm currently working with automatic music genre classification. Usually a music track is divided into time windows, which are used as samples to the svm classifier. Just so you get an idea, in 30s of audio, with 5s windows and overlap of 4.75s, there are about 108 windows per track. Thus, with a 1000 track training set, there are aroung 108k samples (each sample with around 150 features)! With a cpu based SVM, this would take a really long time to train.. But with thundersvm I can make this work. I'm actually trying to show that I don't need all windows to train the model. However, I must show how well the model with all windows performs! Of course, this is not published yet, but i'll send you a reference as soon as i finish it.
Thank you very much.