olmocr icon indicating copy to clipboard operation
olmocr copied to clipboard

Bump datasets from 3.0.0 to 3.2.0

Open dependabot[bot] opened this issue 2 months ago • 0 comments

Bumps datasets from 3.0.0 to 3.2.0.

Release notes

Sourced from datasets's releases.

3.2.0

Dataset Features

  • Faster parquet streaming + filters with predicate pushdown by @​lhoestq in huggingface/datasets#7309
    • Up to +100% streaming speed
    • Fast filtering via predicate pushdown (skip files/row groups based on predicate instead of downloading the full data), e.g.
      from datasets import load_dataset
      filters = [('date', '>=', '2023')]
      ds = load_dataset("HuggingFaceFW/fineweb-2", "fra_Latn", streaming=True, filters=filters)
      

Other improvements and bug fixes

New Contributors

Full Changelog: https://github.com/huggingface/datasets/compare/3.1.0...3.2.0

3.1.0

Dataset Features

  • Video support by @​lhoestq in huggingface/datasets#7230
    >>> from datasets import Dataset, Video, load_dataset
    >>> ds = Dataset.from_dict({"video":["path/to/Screen Recording.mov"]}).cast_column("video", Video())
    >>> # or from the hub
    >>> ds = load_dataset("username/dataset_name", split="train")
    >>> ds[0]["video"]
    <decord.video_reader.VideoReader at 0x105525c70>
    
  • Add IterableDataset.shard() by @​lhoestq in huggingface/datasets#7252
    >>> from datasets import load_dataset
    >>> full_ds = load_dataset("amphion/Emilia-Dataset", split="train", streaming=True)
    >>> full_ds.num_shards
    2360
    >>> ds = full_ds.shard(num_shards=ds.num_shards, index=0)
    >>> ds.num_shards
    1
    

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependabot[bot] avatar Dec 10 '24 22:12 dependabot[bot]