goeffthomas
goeffthomas
There's plenty of documentation about how to build `RecordSet`s and `Field`s from a CSV via `source.extract.column`, but there isn't any for `.xls*` or `.sqlite` files. These also house tabular info...
As part of #245, basic auth support was added for the fetching of raw data files. During a review of using `mlcroissant` for data loading functionality, we realized that there's...
If the metadata for Croissant is pulled via URL (done [here](https://github.com/mlcommons/croissant/blob/main/python/mlcroissant/mlcroissant/_src/structure_graph/nodes/metadata.py#L427-L430)), we should set a user-agent that allows the package to be identified. For reference, `kagglehub` does something similar [here](https://github.com/Kaggle/kagglehub/blob/main/src/kagglehub/clients.py#L61-L83)