datasets-viewer icon indicating copy to clipboard operation
datasets-viewer copied to clipboard

Viewer for the 🤗 datasets library.

Results 10 datasets-viewer issues
Sort by recently updated
recently updated
newest added

As reported by @marshmellow77 in https://github.com/huggingface/datasets/issues/2997#issuecomment-932281234, data viewer still shows incorrect data of `turkish_product_reviews` dataset, which was fixed on Jun 22 2021: https://github.com/huggingface/datasets/commit/16bc665f2753677c765011ef79c84e55486d4347 See: - Screenshot: https://user-images.githubusercontent.com/63367770/135637150-93d9b09b-f1dd-4701-97a5-5cb2672ec0c7.PNG - Link: https://huggingface.co/datasets/viewer/?dataset=turkish_product_reviews

Currently, due to the default behavior of `json.dumps`, unicode (i.e. all non-ASCII) symbols in `Sequence` dataset features are escaped, and thus made unreadable. See the image below (example from the...

I just noticed that changes were made in production on the viewer for some reason (maybe by @srush ?), to add filtering by tags. Therefore before updating the production with...

I ran `datasets-viewer` locally and accessed the `mrpc` subset of the `glue` dataset. Then I followed https://huggingface.co/docs/datasets/quicktour.html, in particular I loaded the same subset + dataset with: ```python >>> from...

bug

I realise this is an optional dependency for `datasets` end-users, but it would be nice to include in the viewer ![Screen Shot 2021-06-21 at 6 30 19 pm](https://user-images.githubusercontent.com/26859204/122796490-bf4c4d00-d2be-11eb-894d-214544b94424.png)

Link to reproduce: https://huggingface.co/datasets/viewer/?dataset=common_voice Changing the subset to something different from `ab` seems to resolve the problem. ![Screen Shot 2021-06-21 at 6 27 56 pm](https://user-images.githubusercontent.com/26859204/122796218-6aa8d200-d2be-11eb-94fb-28a1ad07277c.png)

For example, https://huggingface.co/datasets/viewer/?dataset=arabic_pos_dialect This seems to only happen when the text is in a list, so probably an issue with `json.dumps()`

Reported here: https://github.com/huggingface/datasets/issues/1996

I'm getting the following error when trying to load the definite_pronoun_resolution dataset: ArrowInvalid: Column 2: In chunk 0: Invalid: Values Length (2644) is not equal to the length (1) multiplied...