burn icon indicating copy to clipboard operation
burn copied to clipboard

Training metrics

Open louisfd opened this issue 1 year ago • 12 comments

In burn-train, several metrics can be used during training. It would be great to have more!

  • [X] Accuracy
  • [X] Loss (the one in use)
  • [X] CUDA utilization (memory&compute)
  • [x] Top-k accuracy
  • [x] CPU utilization
  • [x] CPU memory usage
  • [ ] General GPU utilization
  • [ ] General GPU memory usage
  • [ ] Precision Recall
  • [ ] AUC - ROC
  • [ ] BLEU score
  • [ ] ROUGE score

louisfd avatar Jul 25 '23 18:07 louisfd

Hello, can I be assigned ? I'll deal with it

Elazrod56 avatar Jul 26 '23 10:07 Elazrod56

Of course! 😄

louisfd avatar Jul 26 '23 10:07 louisfd

If I understood correctly, all Rust files that get metrics are located in burn-train/src/metric/

Just to check, is my top_k_acc.rs file in the right place ? issue_544_1

Elazrod56 avatar Jul 26 '23 12:07 Elazrod56

Yes, right there :)

louisfd avatar Jul 26 '23 13:07 louisfd

Just a question, by "CPU Memory usage", do you mean the RAM or something else ?

Elazrod56 avatar Aug 01 '23 11:08 Elazrod56

Just the RAM, nothing fancy :)

nathanielsimard avatar Aug 01 '23 12:08 nathanielsimard

I can't get myself to figure out how non-hardware metrics work, what they mean and how to implement them. Is that okay if I only implement the hardware metrics (CPU use, memory use, GPU use, GPU memory) ? I'll add temperature metrics for the CPU and the GPU to compensate my incapability to understand deep learning-specific metrics.

Elazrod56 avatar Aug 10 '23 09:08 Elazrod56

Yes no problem, you can submit a PR with only one metric if you want!

nathanielsimard avatar Aug 10 '23 14:08 nathanielsimard

Is this issue still relevant? I am interested to continue unfinished metrics.

I have the following questions:

  1. I have AMD Radeon RX 5500 GPU on my machine. Is it good enough?
  2. I am not really familiar with deep learning metrics, but I am willing to understand them. Would experienced dev be able to guide me in case I have questions?

oleksii-shyman avatar Nov 01 '23 00:11 oleksii-shyman

You can add metrics with any machine. To understand them, you can probably read some articles first like https://neptune.ai/blog/performance-metrics-in-machine-learning-complete-guide and then maybe use wikipedia for each individual metric you want to add.

nathanielsimard avatar Nov 01 '23 13:11 nathanielsimard