dvc
dvc copied to clipboard
Support strings in metrics
dvc metrics represent scalar numbers
This is nice for finding the difference in a metric between two models, however a couple metrics I'm interested in would benefit from being made more human readable by adding units. Specifically:
- model file size (kilobytes, megabytes, and gigabytes)
- inference latency (milliseconds, seconds)
There are other reasons beyond units to support strings, for example we use vertex.ai's training service, which can and does change without warning, so storing the date of the training would be useful. I would also be interested in having the model sha, so that given a model binary I can quickly verify which row corresponds to the model binary I have.
I have a couple more thoughts on this.
If I commit a file to the git repo with git then in a pr/mr I can see the text differences branches under github/gitlabs changes. Maybe text data should just be left up to git.
It would be helpful to be able to diff files from different branches stored in dvc - is there a way to do that? If I had the two files locally I could diff file1 file2, if stored in git I can git diff branch1:file1 branch2:file2, is there an equivalent way to run a diff in dvc?
I thought that dvc diff branch1:file1 branch2:file2 might do what I would like, instead it just reports that the files are different.
You can already use git to version text-based metrics files by using cache: false to tell DVC not to handle it as a DVC-tracked file, and then tracking the file in git yourself. (But you would still be affected by the existing DVC limitation where metrics files must only contain numeric values)
Regarding diffing, DVC does not know anything about the type of file that it is tracking - everything is treated as arbitrary binary data, so we don't provide any kind of contextual diffing (which depends on handling specific file types).
There are some existing feature requests regarding diff behavior (like https://github.com/iterative/dvc/issues/7657), but essentially you would need to implement something that wraps dvc diff [--json] output yourself and then passes the relevant cache paths into a separate diff tool (as described in #7657)
HI @pmrowla @shortcipher3 @daavoo @codito @tizoc I would like to join iterative Team and contribute to the development of the project please do let me know where I start with the open source contribution till the time I join the team. I am a python developer with 4 years of experience.
Hi @shortcipher3 Could you provide an example of what is the current version doing and what is the requirements so that I can make the changes accordingly.
I created an example repo here
Essentially I have two branches with a metrics file for a tiny model:
{
"size": "1 MB",
"latency": "20 ms",
"mAP": 0.7,
"precision": 0.6,
"recall": 0.8,
"model": "tiny"
}
and a large model:
{
"size": "1 GB",
"latency": "2 s",
"mAP": 0.8,
"precision": 0.7,
"recall": 0.9,
"model": "Gigantamax"
}
When I run a diff I get the following:
# dvc metrics diff --target metrics.json -- main tiny
Path Metric main tiny Change
metrics.json mAP 0.8 0.7 -0.1
metrics.json precision 0.7 0.6 -0.1
metrics.json recall 0.9 0.8 -0.1
I would love to get something more like:
# dvc metrics diff --target metrics.json -- main tiny
Path Metric main tiny Change
metrics.json mAP 0.8 0.7 -0.1
metrics.json precision 0.7 0.6 -0.1
metrics.json recall 0.9 0.8 -0.1
metrics.json size 1 GB 1 MB ---
metrics.json latency 2 s 20 ms ---
metrics.json model gigantamax tiny ---
That way I'm getting a nice table of results and I'm able to easily compare metrics that are on completely different scales (GB/MB and seconds/milliseconds) - it would be hard to read if I converted the GB and MB to bytes, I would be slowing down to count number of digits.
I can also add in meaningful data to help the reader understand the difference.
As for being able to do a local diff, a lot of state of the art research are producing a family of models rather than just a single model, I would love to have a metrics file for each model and be able to do a diff on these. An example is DINO v2
They actually have a table comparing the models on a few metrics one of which has a string for units.
Some other models with multiple sizes are:
- efficient-det shows units on latency, throughput,
#params, and FLOPs - efficientnet
- chatgpt
I would think we could generate some useful tables for understanding some of these parameters automatically, making it easier for the data scientist to make decisions.
Hello @shortcipher3 and @daavoo ,
I had a look into this issue and I might have a suitable solution.
I am new to the community, so I am not sure what is the best way to proceed. Should the issue be assigned to me before a pull request?
Thanks.
Hei @paulourbano ! feel free to open the P.R.