metadata
metadata copied to clipboard
(WIP) Add loading script for arxiv dataset
the dataset can be downloaded here:
wget https://huggingface.co/datasets/ttj/metadata_arxiv/resolve/main/v0.jsonl
I haven't hooked it up with input_pipeline
, so it's not runnable now.
Because I can't decide right now whether to refactor with_metadata.py
or just copy and modify it like in this PR.
What do you think? @timoschick @SaulLu