data-juicer
data-juicer copied to clipboard
Add s3 support for data loader and exporter
per request: https://github.com/modelscope/data-juicer/issues/799
supports
- dataset_path: s3://mnt/dst/the-pile-philpaper-refine-result.jsonl
- .env or environment variable for aws credentials
- ray or default mode
- added sample config
- support s3 exporting