skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Add some stats on the size of the datasets provided in skrub.datasets

Open rcap107 opened this issue 9 months ago • 4 comments

Describe the issue linked to the documentation

I am working on an example with some of the datasets provided in skrub.datasets, and for each of them I need to download and open the data to see how big it is, and in general the shape of the data.

Suggest a potential alternative/fix

Adding at least the number of columns and rows would be useful to know whether a table is too large for my use case (e.g., I don't want to work with 1M rows), and it should not be too complicated.

rcap107 avatar Mar 13 '25 14:03 rcap107

@rcap107 Is this up for grabs?

Neilblaze avatar Mar 31 '25 09:03 Neilblaze

yes, it is thanks @Neilblaze !! I think the simplest thing would be to manually check the number of rows, columns, and size on disk and write that in the docstring. I doesn't have to be done all in one PR if that's too much work it can be done a few datasets at a time

jeromedockes avatar Mar 31 '25 10:03 jeromedockes

@jeromedockes sure thing, I'm on it! Thanks!

Neilblaze avatar Mar 31 '25 15:03 Neilblaze

Hey @Neilblaze, are you still working on this?

rcap107 avatar Jun 19 '25 08:06 rcap107

Closed by #1503

rcap107 avatar Oct 10 '25 08:10 rcap107