datasets-viewer
datasets-viewer copied to clipboard
Viewer for the 🤗 datasets library.
As reported by @marshmellow77 in https://github.com/huggingface/datasets/issues/2997#issuecomment-932281234, data viewer still shows incorrect data of `turkish_product_reviews` dataset, which was fixed on Jun 22 2021: https://github.com/huggingface/datasets/commit/16bc665f2753677c765011ef79c84e55486d4347 See: - Screenshot: https://user-images.githubusercontent.com/63367770/135637150-93d9b09b-f1dd-4701-97a5-5cb2672ec0c7.PNG - Link: https://huggingface.co/datasets/viewer/?dataset=turkish_product_reviews
Currently, due to the default behavior of `json.dumps`, unicode (i.e. all non-ASCII) symbols in `Sequence` dataset features are escaped, and thus made unreadable. See the image below (example from the...
I just noticed that changes were made in production on the viewer for some reason (maybe by @srush ?), to add filtering by tags. Therefore before updating the production with...
I ran `datasets-viewer` locally and accessed the `mrpc` subset of the `glue` dataset. Then I followed https://huggingface.co/docs/datasets/quicktour.html, in particular I loaded the same subset + dataset with: ```python >>> from...
I realise this is an optional dependency for `datasets` end-users, but it would be nice to include in the viewer 
Link to reproduce: https://huggingface.co/datasets/viewer/?dataset=common_voice Changing the subset to something different from `ab` seems to resolve the problem. 
For example, https://huggingface.co/datasets/viewer/?dataset=arabic_pos_dialect This seems to only happen when the text is in a list, so probably an issue with `json.dumps()`
Reported here: https://github.com/huggingface/datasets/issues/1996
I'm getting the following error when trying to load the definite_pronoun_resolution dataset: ArrowInvalid: Column 2: In chunk 0: Invalid: Values Length (2644) is not equal to the length (1) multiplied...