thundersvm Fatal log at syncmem.h:13

I understand that this has been reported before. However, I believe that this is caused by something else this time.

I am using the scikit interface of thundersvm. I train 10 different models and select the highest scoring on validation data to predict the test data. This works fine for the first fold. However, when the script loops to the sencond fold I get the following message:

2018-12-27 11:28:00,191 INFO [default] #instances = 54123, #features = 156
2018-12-27 11:28:00,212 INFO [default] #classes = 10
2018-12-27 11:28:00,212 FATAL [default] Check failed: [error == cudaSuccess]  initialization error
2018-12-27 11:28:00,212 WARNING [default] Aborting application. Reason: Fatal log at [/home/juliano/thundersvm/include/thundersvm/syncmem.h:13]

Line 13 is the CUDA_CHECK line below:

#ifdef USE_CUDA
        CUDA_CHECK(cudaMallocHost(ptr, size));
#else
        *ptr = malloc(size);
#endif

I believe that something in the predict method causes the cuda device memory allocation to fail the next time thundersvm tries to allocate device memory. Maybe it does not cleanup properly when using the scikit interface?

I don't believe I have an OOM error since this is a snapshot of nvidia-smi right after the predict. (nvidiasmi was run with os.system('nvidia-smi') in python. So this is precisely after the the predict.

Thu Dec 27 11:45:34 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:01:00.0 Off |                  N/A |
| 33%   60C    P2    75W / 250W |    209MiB / 12189MiB |     65%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   47C    P2    28W / 151W |     10MiB /  8114MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     28396      C   python                                       199MiB |
+-----------------------------------------------------------------------------+

Since there's no way of telling the scikit interface that I was done with predicting the model, it was still loaded in the device memory.

A possible fix is a method in the scikit interface that allows cleaning up thundersvm resources for that scikit model manually after predicting. I'm not sure how to do that :(

Thank you very much.

Dec 27 '18 13:12 julianofoleiss

I'll try to come up with a minimal working example...

Dec 27 '18 14:12 julianofoleiss

cudaMallocHost(ptr, size) will malloc the cpu memory but not the gpu memory. There maybe not enough cpu memory in your case. You can try to use the parameter max_mem_size to limit the memory you use. You can also use the del() function if you want to delete the model object (ref). It'd be better if you can provide an example.

We would like to know what applications our user community are working on. Could you tell us what problems you are addressing using thundersvm? This will help us prioritise the tasks for the upgrade in our backlog. Thanks.

Dec 27 '18 14:12 QinbinLi

First of all, thanks @GODqinbin for the tips, I'll try them as soon as I can. This machine has 32GB of RAM, and it seems like that's not the issue. However, i'll check that to make sure :)

I'm currently working with automatic music genre classification. Usually a music track is divided into time windows, which are used as samples to the svm classifier. Just so you get an idea, in 30s of audio, with 5s windows and overlap of 4.75s, there are about 108 windows per track. Thus, with a 1000 track training set, there are aroung 108k samples (each sample with around 150 features)! With a cpu based SVM, this would take a really long time to train.. But with thundersvm I can make this work. I'm actually trying to show that I don't need all windows to train the model. However, I must show how well the model with all windows performs! Of course, this is not published yet, but i'll send you a reference as soon as i finish it.

Thank you very much.

Dec 27 '18 15:12 julianofoleiss