genienlp
genienlp copied to clipboard
build(deps): bump datasets from 2.11.0 to 2.13.1
Bumps datasets from 2.11.0 to 2.13.1.
Release notes
Sourced from datasets's releases.
2.13.1
General improvements and bug fixes
- Fix JSON generation in benchmarks CI by
@mariosasko
in huggingface/datasets#5966- Always return list in
list_datasets
by@mariosasko
in huggingface/datasets#5964- Add
encoding
anderrors
params to JSON loader by@mariosasko
in huggingface/datasets#5969- Filter unsupported extensions by
@lhoestq
in huggingface/datasets#5972Full Changelog: https://github.com/huggingface/datasets/compare/2.13.0...2.13.1
2.13.0
Dataset Features
Add IterableDataset.from_spark by
@maddiedawson
in huggingface/datasets#5770
- Stream the data from your Spark DataFrame directly to your training pipeline
from datasets import IterableDataset from torch.utils.data import DataLoader
ids = IterableDataset.from_spark(df) ids = ids.map(...).filter(...).with_format("torch") for batch in DataLoader(ids, batch_size=16, num_workers=4): ...
IterableDataset formatting for PyTorch, TensorFlow, Jax, NumPy and Arrow:
- IterableDataset Arrow formatting by
@lhoestq
in huggingface/datasets#5821- Iterable torch formatting by
@lhoestq
in huggingface/datasets#5852from datasets import load_dataset
ids = load_dataset("c4", "en", split="train", streaming=True) ids = ids.map(...).with_format("torch") # to get PyTorch tensors - also works with tf, np, jax etc.
Add IterableDataset.from_file to load local dataset as iterable by
@mariusz-jachimowicz-83
in huggingface/datasets#5893from datasets import IterableDataset
ids = IterableDataset.from_file("path/to/data.arrow")
Arrow dataset builder to be able to load and stream Arrow datasets by
@mariusz-jachimowicz-83
in huggingface/datasets#5944from datasets import load_dataset
ds = load_dataset("arrow", data_files={"train": "train.arrow", "test": "test.arrow"})
... (truncated)
Commits
682d21e
Release: 2.13.1 (#5978)76f75a9
Filter unsupported extensions (#5972)79c340f
Addencoding
anderrors
params to JSON loader (#5969)53ac2d9
Always return list inlist_datasets
(#5964)02e1e9a
Fix JSON generation in benchmarks CI (#5966)b96da7f
set dev version (#5958)9aaee6f
Release: 2.13.0 (#5957)650a86e
Fix sequence of array support for most dtype (#5948)f1911ff
Add Arrow builder docs (#5952)9e40d28
Fix ArrowExamplesIterable.shard_data_sources (#5956)- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
-
@dependabot rebase
will rebase this PR -
@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it -
@dependabot merge
will merge this PR after your CI passes on it -
@dependabot squash and merge
will squash and merge this PR after your CI passes on it -
@dependabot cancel merge
will cancel a previously requested merge and block automerging -
@dependabot reopen
will reopen this PR if it is closed -
@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually -
@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) -
@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)