Rayan Hatout

Results 52 comments of Rayan Hatout

Alright so with OpenCL I get the following results: Save roughly `80,000` calls to `isinstance` from `132,929` to `55,789` **(-58%)**. Forward pass on average **9%** faster Backward pass on average...

Well this was fun, took me longer to figure out this single test case than the rest of the PR combined. Anyway it passes locally now and performance numbers are...

Ok spent some time hunting exactly where things go wrong in the OpenCL test and came up with minimal example `def test_sum_num_hoisted_and_factors_cancel_out`

Not ready for review yet, the changes make `mypy` very angry + i'm getting a lot of variance on the performance improvement figures. @geohot do we have a more disciplined...

@YurySolovyov yeah we should probably add a rule for that if I manage to consistently show a perf improvement, right now it's a bit unclear @geohot my bad I hadn't...

Hey @geohot I got sidetracked and did a bunch of adhoc optimizations that shaved 300ms of runtime for LLaMa on my M1 Mac (800 -> 500). Would you mind running...

hmmm experimenting with some stuff, is it worth it to make `forward` go slower if we can make `backward` go significantly faster? Before ``` forward pass: 16.117 ms, 0.84x off...

in the general case though is that tradeoff not worth it? or is the vision that tinygrad is more about fast inference than fast training?

Huh actually i completely hallucinated the tradeoff, I hadn't re-measured "before" in a while; seems like baseline in now _way_ faster than what it was in the "after" of my...