hub-docs icon indicating copy to clipboard operation
hub-docs copied to clipboard

Display size of the generated dataset, downloaded dataset files, total amount of disk used in GB when MB >= 1000 in dataset cards

Open samjgorman opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. Currently, the following data fields on the hub are only displayed in MB.

--Size of the generated dataset: --Size of downloaded dataset files: --Total amount of disk used:

Figures like 1895.01 MB and 1611.50 MB can become unwieldly as they grow in size to reason about the space they require, compared to 1.89501 GB or 1.6115 GB.

An example page for reference

Describe the solution you'd like Convert numbers to GB in dataset cards when MB > 1000.

Describe alternatives you've considered I considered advocating to truncate size to 3 decimal points, as precision to 5 or 6 decimal points in GB (such as a number like 1.543210 GB) may be an unnecessary degree of precision to provide for users.

Ultimately though, I reasoned that more precision is often better.

Additional context I'm happy to contribute to this. I didn't see exactly where this was handled in the current codebase, so any pointers appreciated. Also, let me know if I should be opening this up in the datasets repo instead...

samjgorman avatar Jan 12 '22 05:01 samjgorman

@osanseviero do you think this is one for the https://github.com/huggingface/hub-docs or the forum?

adrinjalali avatar Mar 16 '22 13:03 adrinjalali

For now, I would just push to have all issues related to the Hub in hub-docs instead of closing existing ones.

osanseviero avatar Mar 16 '22 13:03 osanseviero