Sylvain Lesage issues

Results 175 issues of


                                            Sylvain Lesage

Process part of the columns, instead of giving an error?

When the number of columns is above 1000, we don't process the split. See https://github.com/huggingface/datasets-server/issues/1143. Should we instead "truncate", and only process the first 1000 columns, and give a hint...

feature request

Croissant: fix sha256 field

We currently return: ``` "sha256": "https://github.com/mlcommons/croissant/issues/80" ``` See https://github.com/mlcommons/croissant/issues/80. cc @marcenacp

blocked-by-upstream

Croissant: specify the splits in RecordSet?

eg https://datasets-server.huggingface.co/croissant?dataset=mnist&full=true See https://github.com/mlcommons/croissant/blob/main/docs/howto/specify-splits.md The splits could be specified at the RecordSet level.

feature request

Group endpoints for calls from the Hub (back for front)

In [moonlanding](https://github.com/huggingface/moon-landing/pull/8565#discussion_r1440224297) (internal) we call several endpoints in parallel. We could group them in one call to datasets-server, with all the available information in one response.

refactoring / architecture

Fix all `PreviousStepFormatError` cache entries

And monitor regularly this error (as well as any non-expected error, related to https://github.com/huggingface/datasets-server/issues/1443)

bug

autoconverted parquet file has too big cells

See https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera/discussions/1#6523d448b623a04e6c2f118a > > From the logs I see this error > > TooBigRows: Rows from parquet row groups are too big to be read: 313.33 MiB (max=286.10 MiB) >...

bug

Replace TypedDict with dataclass?

Do we want to replace the TypedDict objects with dataclasses? If so: note that the objects we serialize should be serialized too without any change by orjson, at the price...

good first issue

question

refactoring / architecture

Sylvain Lesage