data-juicer icon indicating copy to clipboard operation
data-juicer copied to clipboard

Add s3 support for data loader and exporter

Open cyruszhang opened this issue 1 month ago • 0 comments

per request: https://github.com/modelscope/data-juicer/issues/799

supports

  • dataset_path: s3://mnt/dst/the-pile-philpaper-refine-result.jsonl
  • .env or environment variable for aws credentials
  • ray or default mode
  • added sample config
  • support s3 exporting

cyruszhang avatar Nov 05 '25 20:11 cyruszhang