Andrej comments

Results 373 comments of


                                            Andrej

Web of trust (wot)

It's due to this ``` x, y = x.pin_memory().to(device, non_blocking=True), y.pin_memory().to(device, non_blocking=True) ``` happens async

Replaced parameter B with the total parameters calculated outside of the kernel

Does this actually make difference? Maybe I'd prefer the more consistent naming scheme.

Add optimized GPU kernels for encoder_backward using shared memory

I don't think we can merge this because we need determinism, this uses atomicAdd. Summoning @ademeure for comment too

Deleting Conda/Python as a dependency entirely to dramatically decrease "latency to step"

FineWeb100B is 1010 files total, these are raw .bin shards of 100M tokens each - Each is of size 191MB - Zipped, each is 134MB 134MB * 1010 files =...

Deleting Conda/Python as a dependency entirely to dramatically decrease "latency to step"

(I used streaming originally but then started getting some errors in the tokenization workers when a request randomly fails, so I took it out)

Deduplication of text chunks with frequency count, training and encoding 5x speedup

Good idea and I was able to reproduce this. I'll think about how I can maybe create an optimized version (that is still in Python land), which maybe prioritizes speed...

Added kernel development file for permute_backwards

Not sure how much we care about this kernel

The website cannot be accessed.

I took the site down. I just don't have time to maintain this project, it is a side project from way back in my PhD days. Other people have to...

The website cannot be accessed.

I'll keep issue open in case people want to share links to alternatives etc. and for awareness.

Model Export & Inference

@YuchenJin yep exactly what I had in mind! I put up the issue because I am sequencing other things before I get around to it, possibly someone can pick it...