text
text copied to clipboard
Avoid unnecessary memory allocations in language modeling datasets.
Language modeling datasets construct all datasets even if only a subset is constructed. It also stores the fully numericalized version of the dataset if it's stored as "a single line" (word by word), but not otherwise
Codecov Report
Merging #992 into master will increase coverage by
0.27%. The diff coverage is86.95%.
@@ Coverage Diff @@
## master #992 +/- ##
==========================================
+ Coverage 77.70% 77.98% +0.27%
==========================================
Files 44 44
Lines 3100 3102 +2
==========================================
+ Hits 2409 2419 +10
+ Misses 691 683 -8
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...rchtext/experimental/datasets/language_modeling.py | 84.84% <86.95%> (+11.41%) |
:arrow_up: |
| ...ext/experimental/datasets/raw/language_modeling.py | 80.00% <0.00%> (+1.53%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 618795d...cba98e9. Read the comment docs.
closing stale PR