Andrej
Andrej
It's due to this ``` x, y = x.pin_memory().to(device, non_blocking=True), y.pin_memory().to(device, non_blocking=True) ``` happens async
Does this actually make difference? Maybe I'd prefer the more consistent naming scheme.
I don't think we can merge this because we need determinism, this uses atomicAdd. Summoning @ademeure for comment too
FineWeb100B is 1010 files total, these are raw .bin shards of 100M tokens each - Each is of size 191MB - Zipped, each is 134MB 134MB * 1010 files =...
(I used streaming originally but then started getting some errors in the tokenization workers when a request randomly fails, so I took it out)
Good idea and I was able to reproduce this. I'll think about how I can maybe create an optimized version (that is still in Python land), which maybe prioritizes speed...
Not sure how much we care about this kernel
I took the site down. I just don't have time to maintain this project, it is a side project from way back in my PhD days. Other people have to...
I'll keep issue open in case people want to share links to alternatives etc. and for awareness.
@YuchenJin yep exactly what I had in mind! I put up the issue because I am sequencing other things before I get around to it, possibly someone can pick it...