RL4LMs
RL4LMs copied to clipboard
Memory issue in metric evals?
Hi all,
I am encountering a gpu memory issue in metric evaluations.
I am using the following metrics:
metrics:
- id: meteor
args: {}
- id: rouge
- id: bleu
args: {}
- id: bert_score # TODO AM running into cuda memory insufficient here
args:
language: en
- id: cider
- id: diversity
args: {}
On monitoring the GPU usage for the card hosting the metric models, I see a steady increase in memory occupied:
initial:
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:00:1D.0 Off | 0 |
| N/A 51C P0 71W / 300W | 3514MiB / 32768MiB | 0% Default |
| | | N/A |
at 200 epochs
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... On | 00000000:00:1D.0 Off | 0 |
| N/A 53C P0 73W / 300W | 22171MiB / 32768MiB | 0% Default |
| | | N/A |
Any idea what might be causing this? Thanks