text icon indicating copy to clipboard operation
text copied to clipboard

Use consistent data_select default

Open cpuhrsch opened this issue 5 years ago • 1 comments

Right now the default sometime is "train", "test", "valid" and sometimes (but more commonly) "train", "valid", "test". We should pick a single convention (this PR opts for the latter) to avoid confusion. This also affects one of the tests, which incorrectly assumed the latter was always the case.

NOTE: This also affects the BERT examples, which was built on top of assuming train, valid, test. NOTE: This also shows that WikiText103 isn't covered by tests. It's very large, but we should find a way of using a subset of the data to test this etc.

cpuhrsch avatar Sep 20 '20 21:09 cpuhrsch

Codecov Report

Merging #995 into master will decrease coverage by 0.45%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #995      +/-   ##
==========================================
- Coverage   78.27%   77.82%   -0.46%     
==========================================
  Files          44       44              
  Lines        3126     3084      -42     
==========================================
- Hits         2447     2400      -47     
- Misses        679      684       +5     
Impacted Files Coverage Δ
...rchtext/experimental/datasets/language_modeling.py 81.96% <ø> (ø)
torchtext/experimental/datasets/translation.py 76.81% <ø> (ø)
...ext/experimental/datasets/raw/language_modeling.py 80.00% <100.00%> (ø)
torchtext/experimental/transforms.py 85.52% <0.00%> (-10.17%) :arrow_down:
...htext/experimental/datasets/text_classification.py 76.47% <0.00%> (+0.75%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7e267d2...2426c5c. Read the comment docs.

codecov[bot] avatar Sep 20 '20 22:09 codecov[bot]