Shang Wang
Shang Wang
Hi, In the current version of the master branch, `pdb.set_trace` is always called: https://github.com/attardi/wikiextractor/blob/master/wikiextractor/extract.py#L85 even though this line and `import pdb` should be removed? Thanks!
The compliance checker should verify the vocab size for the bert and rnnt benchmarks.
Work in progress. For issue #79 . https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#94-quality-measure No examples to test with: rnnt, unet3d Not sure if the current reference implementations match the training rules: bert, dlrm
Hi, I notice that the download URL for the [`CommonCrawlDataset`](https://github.com/EleutherAI/the-pile/blob/master/the_pile/datasets.py#L756) is `http://eaidata.bmk.sh/data/pile_cc_filtered_deduped.jsonl.zst`. In other words, this CC dataset is already deduplicated and filtered? However, it doesn't seem like https://github.com/leogao2/commoncrawl_downloader in...