wenet icon indicating copy to clipboard operation
wenet copied to clipboard

[dataset] supoort shard by many jsonl files

Open Mddct opened this issue 1 year ago • 0 comments

  • [ ] 需要验证下
# usage1:
json_files = ["1.jsonl", "2.jsonl", "3.jsonl"]
dataset = WenetRawDatasetSource(json_files, partiaion=True, shard_by_files=True)

# usage2:
json_files = "all.jsonl"
dataset = WenetRawDatasetSource(json_files, partiaion=True, shard_by_files=True)

Mddct avatar Sep 27 '24 07:09 Mddct