Roelof van Dijk

Results 34 comments of Roelof van Dijk

Done as far as I am concerned. Should be an easy ~10% speedup. The next step is caching all shapetracker methods. Work in progress, but looks promising. This branch ```...

Reduced number of commits. Ready.

LAZYCACHE=0 python3.11 -O test/external/external_test_speed_llama.py ``` Master codegen mean runtime: 119.23ms, runs: 135.19, 111.06, 138.51, 110.68, 108.25, 112.41, 111.19, 143.45, 109.42, 112.13 methodcache mean runtime: 111.93ms, runs: 105.79, 105.06, 105.42, 140.35,...

There are some minor performance tweaks included - I can remove those if you want to keep this MR cleaner.

The diff was larger because I had removed several methods that were used only once, mainly in the View init. * `filter_strides` * `is_contiguous` * `view_from_shape` This reduced the function...

``` LAZYCACHE=0 python3.11 -O test/external/external_test_speed_llama.py Master codegen mean runtime: 119.23ms, runs: 135.19, 111.06, 138.51, 110.68, 108.25, 112.41, 111.19, 143.45, 109.42, 112.13 methodcache mean runtime: 111.93ms, runs: 105.79, 105.06, 105.42, 140.35,...

@geohot This should be done. I think I can achieve a 20% speedup on LLaMA (and basically everything else) with this and similar refactors (e.g. remove the ShapeTracker class, better...

**Re: This MR** For readability, the class methods are nicest, but badly cachable (see below). However, in this case, the caches don't add that much. I have moved `idxs_to_idx` out...

Comes for free once Views are NamedTuples - they're immutable.

What is being saved? Runner time? Bandwidth? Ingress? Based on the tqdm timings, the downloads are near instant because the download speed is so high. Creating the cache, however, adds...