goeffthomas

Results 7 comments of goeffthomas

Nice work @pdurbin! BTW, as a little prework on the Kaggle Dataverse integration we've been looking at, I wrote a little notebook to figure out what version every installation is...

> Name of the sheet. Croissant: this could be RecordSet.extract.sheet_name (new). I wonder if we could use less Excel-specific language? Like, would `RecordSet.extract.table` be more general and achieve the same...

> Could we look for both formats? I'm a bit biased here because I'll be going from file extension -> MIME type and won't be using the deprecated x-sqlite3. Context:...

> My naive assumption was that all tabular data files are loaded and converted into a homogeneous representation on the Kaggle side, and so you can provide access to all...

@benjelloun Could we try to get this into the next Croissant version?

We don't currently do it today, but this is related to a feature I'd love to see on Kaggle where datasets link to each other (or models) by showing which...

We should probably add that same user agent when downloading the files over HTTP: https://github.com/mlcommons/croissant/blob/main/python/mlcroissant/mlcroissant/_src/operation_graph/operations/download.py#L188-L190