llm-foundry
llm-foundry copied to clipboard
Update datasets requirement from <2.20,>=2.19 to >=2.20.0,<2.21
Updates the requirements on datasets to permit the latest version.
Release notes
Sourced from datasets's releases.
2.20.0
Important
- Remove default
trust_remote_code=Trueby@lhoestqin huggingface/datasets#6954
- datasets with a python loading script now require passing
trust_remote_code=Trueto be usedDatasets features
- [Resumable IterableDataset] Add IterableDataset state_dict by
@lhoestqin huggingface/datasets#6658
checkpoint and resume an iterable dataset (e.g. when streaming):
>>> iterable_dataset = Dataset.from_dict({"a": range(6)}).to_iterable_dataset(num_shards=3) >>> for idx, example in enumerate(iterable_dataset): ... print(example) ... if idx == 2: ... state_dict = iterable_dataset.state_dict() ... print("checkpoint") ... break >>> iterable_dataset.load_state_dict(state_dict) >>> print(f"restart from checkpoint") >>> for example in iterable_dataset: ... print(example)Returns:
{'a': 0} {'a': 1} {'a': 2} checkpoint restart from checkpoint {'a': 3} {'a': 4} {'a': 5}General improvements and bug fixes
- Add docs about the CLI by
@albertvillanovain huggingface/datasets#6831- Remove token arg from CLI examples by
@albertvillanovain huggingface/datasets#6839- Allow deleting a subset/config from a no-script dataset by
@albertvillanovain huggingface/datasets#6820- Fix line-endings in tests on Windows by
@albertvillanovain huggingface/datasets#6857- Fix CI by temporarily pinning huggingface-hub < 0.23.0 by
@albertvillanovain huggingface/datasets#6861- Fix dataset name for community Hub script-datasets by
@albertvillanovain huggingface/datasets#6855- Update tqdm >= 4.66.3 to fix vulnerability by
@albertvillanovain huggingface/datasets#6870- Fix download for dict of dicts of URLs by
@albertvillanovain huggingface/datasets#6871- Set dev version by
@albertvillanovain huggingface/datasets#6873- Shorten long logs by
@lhoestqin huggingface/datasets#6875- Support jax 0.4.27 in CI tests by
@albertvillanovain huggingface/datasets#6885- Close gzipped files properly by
@lhoestqin huggingface/datasets#6893- Make CLI convert_to_parquet not raise error if no rights to create script branch by
@albertvillanovain huggingface/datasets#6902
... (truncated)
Commits
98fdc9eRelease: 2.20.0 (#6969)af3acfdfix(ci): remove unnecessary permissions (#6962)37a6036Move info_utils errors to exceptions module (#6952)9510252Better error handling indataset_module_factory(#6959)97513befeat(ci): add trufflehog secrets detection (#6960)686f5dfAdd support for categorical/dictionary types (#6892)a2dc287Remove defaulttrust_remote_code=True(#6954)09ebf51Validate config name and data_files in packaged modules (#6915)5bbbf1bValidate config name and data_files in packaged modules (#6915)6548e0eFix typos in docs (#6957)- Additional commits viewable in compare view
You can trigger a rebase of this PR by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Note Automatic rebases have been disabled on this pull request as it has been open for over 30 days.