deepdetect
                                
                                 deepdetect copied to clipboard
                                
                                    deepdetect copied to clipboard
                            
                            
                            
                        SVM connector training fails
Configuration
- Version of DeepDetect:
- [X] Locally compiled on:
- [X] Ubuntu 14.04 LTS
- [ ] Mac OSX
- [ ] Other:
 
- [ ] Docker
- [ ] Amazon AMI
 
- [X] Locally compiled on:
- Commit (shown by the server when starting): 073e9a1f5cf5a565ee91c5a9b46a6b8b3afc19f3
Your question / the problem you're facing:
This issue is related to #761 and more precisely to the fix #762 . It resolved the inference issue however now it is impossible for me to train a model using a svm connector I joined some random data if you want to replicate. To replicate you can just download it and extract it. In the next section we will consider PATH_MODEL as the path where the model is stored.
Error message (if any) / steps to reproduce the problem:
Let us first create the service we will use to train:
- [X] list of API calls:
    curl -X PUT "http://localhost:8082/services/svm_test" -d '{
                "sname": "svm_test",
                "description": "classification model",
                "mllib": "caffe",
                "type": "supervised",
                "parameters": {
                        "input": {
                                "connector": "svm"
                        },
                        "mllib": {
                                "gpu": true,
                                "gpuid": 1,
                                "template": "mlp",
                                "nclasses": 2,
                                "ntargets": null,
                                "layers": [128,64,32],
                                "activation": "relu",
                                "dropout": 0.5,
                                "regression": false,
                                "finetuning": false,
                                "db": true
                        },
                        "output":{}
                },
                "model": {
                        "repository": "PATH_MODEL/bug_svm_prediction",
                        "templates": "../templates/caffe",
                        "weights": null
                }
        }'
- [X] Server log output:
DeepDetect [ commit 073e9a1f5cf5a565ee91c5a9b46a6b8b3afc19f3 ]
[2020-07-29 00:15:30.030] [api] [info] Running DeepDetect HTTP server on localhost:8082
[2020-07-29 00:16:02.615] [svm_test] [info] Using GPU 1
ETC.
ETC.
[2020-07-29 00:16:03.168] [svm_test] [info] instantiating model template mlp
[2020-07-29 00:16:03.168] [svm_test] [info] source=../templates/caffe/mlp/
[2020-07-29 00:16:03.168] [svm_test] [info] dest=PATH_MODEL/mlp.prototxt
[2020-07-29 00:16:03.170] [api] [info] 127.0.0.1 "PUT /services/svm_test" 201 556
Now we can try launching a training with an older version of DD (caaeb7866a80a581bef3353218c5394db1af2363). We observe that the training is launched.
- [X] list of API calls:
curl -X POST "http://127.0.0.1:8082/train" -d '{
                "service": "svm_test",
                "async": false,
                "data": [
                        "PATH_MODEL/data/train.svm",
                        "PATH_MODEL/data/test.svm"
                ],
                "parameters":{
                        "input": {
                                "db": true
                        },
                        "mllib": {
                                "gpu": true,
                                "resume": false,
                                "ignore_label": null,
                                "solver": {
                                        "iterations": 1000,
                                        "snapshot": 500,
                                        "snapshot_prefix": null,
                                        "solver_type": "ADAM",
                                        "test_interval": 100,
                                        "test_initialization": false,
                                        "lr_policy": "step",
                                        "base_lr": 0.001,
                                        "gamma": 0.1,
                                        "stepsize": 100,
                                        "momentum": 0.9,
                                        "weight_decay": 0.00001,
                                        "power": null,
                                        "iter_size": 1
                                },
                                "net": {
                                        "batch_size": 1,
                                        "test_batch_size": 1
                                }
                        },
                        "output": {
                                "best": 2,
                                "measure": ["accp", "mcll", "f1", "mcc"]
                        }
                }
        }'
- [X] Server log output:
[2020-07-29 00:25:17.238] [svm_test] [info] Net total flops=10560 / total params=10560
[2020-07-29 00:25:17.238] [svm_test] [info] detected network type is classification
[2020-07-29 00:25:17.238] [caffe] [info] Opened lmdb PATH_MODEL/bug_svm_prediction/test.lmdb
[2020-07-29 00:25:17.244] [api] [info] 127.0.0.1 "POST /train" 201 1297
However now if we use the new version corresponding to commit 073e9a1f5cf5a565ee91c5a9b46a6b8b3afc19f3 the train fails almost immediately.
- [X] list of API calls:
curl -X POST "http://127.0.0.1:8082/train" -d '{
                "service": "svm_test",
                "async": false,
                "data": [
                        "PATH_MODEL/data/train.svm",
                        "PATH_MODEL/data/test.svm"
                ],
                "parameters":{
                        "input": {
                                "db": true
                        },
                        "mllib": {
                                "gpu": true,
                                "resume": false,
                                "ignore_label": null,
                                "solver": {
                                        "iterations": 1000,
                                        "snapshot": 500,
                                        "snapshot_prefix": null,
                                        "solver_type": "ADAM",
                                        "test_interval": 100,
                                        "test_initialization": false,
                                        "lr_policy": "step",
                                        "base_lr": 0.001,
                                        "gamma": 0.1,
                                        "stepsize": 100,
                                        "momentum": 0.9,
                                        "weight_decay": 0.00001,
                                        "power": null,
                                        "iter_size": 1
                                },
                                "net": {
                                        "batch_size": 1,
                                        "test_batch_size": 1
                                }
                        },
                        "output": {
                                "best": 2,
                                "measure": ["accp", "mcll", "f1", "mcc"]
                        }
                }
        }'
- [X] Server log output:
{"status":{"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"./include/caffe/util/db_lmdb.hpp:15 / Check failed (custom): (mdb_status) == (0)"}}
[2020-07-29 00:20:37.494] [svm_test] [info] detected network type is classification
[2020-07-29 00:20:37.505] [svm_test] [info] Iteration 0, lr = 0.001, smoothed_loss=0.523027
[2020-07-29 00:20:37.562] [caffe] [info] Ignoring source layer prob
[2020-07-29 00:20:37.562] [svm_test] [error] Error while filling up network for testing
[2020-07-29 00:20:37.565] [svm_test] [error] training call failed
[2020-07-29 00:20:37.565] [api] [error] 127.0.0.1 "POST /train" 500 814
Ah, see #765. I had tested in-memory training only. To add svm + db training to unit tests I had to add test_split support to SVM + db training first, so this is what #765 does now. I've left a test for in-memory svm training.
It seems to work on my side. Thanks!