aim
aim copied to clipboard
Filtering runs by the number of steps/epochs
🚀 Feature
As a user, I would like to be able to filter the runs in metrics explorer by the number of steps/epochs.
Motivation
I have a lot of runs and some of them have only a few steps because I was only debugging the scripts. So ignoring the runs with a number of steps fewer than X would be helpful.
Pitch
Enter something like (run.steps > 100)
in the filter of metrics explorer and only see the runs that lasted more than that
Alternatives
In my particular case, the alternative would be to select all runs that performed less than 100 steps and delete(archive) them all at once
Thanks for considering!
@avkudr thanks for opening this issue, it seems super useful 🙌
I think if steps count will be stored inside runs metadata during the training(along with other properties of runs), it will enable searching runs by steps count both from explorers and programmatically via SDK.
But the thing is that usually metrics have different lengths, e.g. train loss has more steps than validation loss.
Hence, I think the proposed query syntax could be modified a bit. For example: run["loss", {"subset": "train"}].steps > 100
.
Searching by metric last step could be enabled as well e.g. run["loss", {"subset": "train"}].value.last <= 0.001
.
However, I believe the above mentioned syntax is too complicated(and ugly :D). @avkudr @roubkar @alberttorosyan @mahnerak Any better alternatives that come to your mind?
@gorarakelyan I have the following suggestion.
since metric
in the context of the query refers to a unique metric sequence, steps count can be added as a property of the metric. So the syntax would be like:
metric.steps > 100
There's yet one thing to clarify. Since metrics are sparse, should we take the last step or the number of tracked values?