spade
spade copied to clipboard
Training on FUNSD: Cuda out of Memory on GPU with 12Gb memory.
First of all, congratulations to the entire team on the amazing work.
I was trying to train SPADE on the FUNSD dataset on GPU with 12Gb mem (GeForce RTX 208 Ti). But getting RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 10.75 GiB total capacity; 9.14 GiB already allocated; 24.25 MiB free; 9.42 GiB reserved in total by PyTorch)
Is it at all possible to train SPADE on a GPU with 12Gb mem.? Comments in another issue says that it needs GPU with at least 24Gb mem. https://github.com/clovaai/spade/issues/2#issuecomment-915036284
Help will be appreciated. Thanks
Hi @asidharth019
You may turn off relative_attention
(see this comment )
Or, you may use smaller encoder, for example, bert-base-multilingual-cased-(3 or 4)layers (please refer to this comment)
Good luck!
Thanks for the above solution. I was able to run the training. 🙂
Facing a few issues with the output for FUNSD:-
-
I have trained using the funsd_config. Not getting the output in the expected format shown in the CORD example, The output of sample from FUNSD dataset "data_id": "83594639":- Why we are getting a dictionary within the list? Also, not much linked entities. [{"{'qa.question': 'Date:'}": [[{'qa.answer': 'September 15: 1997'}]]}, [{'qa.question': 'Company:'}], [{'qa.question': 'From:'}], [{'qa.question': 'DATA'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Fax'}], [{'qa.question': 'Advertising'}], [{'qa.question': 'Media'}], [{'other.other': '7707'}]]
-
It is mentioned that we should not use Val score for model selection. Please guide on what to use for model selection.
-
The reason why the FUNSD output format is different compared to CORD is because of the difference in the depth of information. Please refer to the Table 2. Also, the examples from FUNSD often consist of not fully filled documents. Check the original document image.
-
Also, if the parse above represents the "prediction" check the ground truth output first.
-
By the way, set
toy_data: false
in the config. See below.- https://github.com/clovaai/spade/blob/a85574ceaa00f1878a23754f283aa66bc2daf082/configs/funsd.1.5layers.train.yaml#L79 -
For the model selection, use "early stopping".
Best,
I have already set toy_data: false. What will be the best way to apply "early stopping" in this training process?
Samples count:-
- Train: 149
- Val: 8
- Held out Test: 50 The above split is according to the given train YAML and the data for FUNSD is generated using provided preprocess file.
On Train & Val, I am getting decent performance on ELK but the performance on the Held-out test set is very bad, Score Dict for held-out test {"test__avg_loss": 1.0109899044036865, "test__f1": -1, "test__precision_edge_avg": 0.26012873043052837, "test__recall_edge_avg": 0.09196836541370143, "test__f1_edge_avg": 0.1347639744054087, "test__precision_edge_of_type_0": 0.37181996086105673, "test__recall_edge_of_type_0": 0.14822244511311713, "test__f1_edge_of_type_0": 0.2119521912350598, "test__precision_edge_of_type_1": 0.1484375, "test__recall_edge_of_type_1": 0.03571428571428571, "test__f1_edge_of_type_1": 0.05757575757575757, "p_r_f1_entity": [[0.3888888888888889, 0.1721311475409836, 0.23863636363636365], [0.8102941176470588, 0.5116063138347261, 0.6272054638588503], [0.7383367139959433, 0.44336175395858707, 0.5540334855403348], [0.5359477124183006, 0.26282051282051283, 0.35268817204301073]], "p_r_f1_all_entity_ELB": [0.7376811594202899, 0.4365351629502573, 0.5484913793103449], "p_r_f1_link_ELK": [0.3392857142857143, 0.03571428571428571, 0.06462585034013606]}
Please guide
Hi @asidharth019
Sorry for being late in reply. You may increase the number of training epochs? As far as I remember, you should get near 100% accuracy on training set.
Also, please be aware of that in case of FUNSD, the validation set is a subset of the training set. See README/Model/Training section.
Wonseok
If relative_attention is closed, are the highlights mentioned in the paper meaningless?