treccastweb
treccastweb copied to clipboard
fix "-2" in provenance
I spotted one case where there is an MS MARCO id broken into two within an array. I did not perform further checks, but it seems likely to me that there was just a linebreak added by accident in some point in the pipeline.