Roelof van Dijk
Roelof van Dijk
Done as far as I am concerned. Should be an easy ~10% speedup. The next step is caching all shapetracker methods. Work in progress, but looks promising. This branch ```...
Reduced number of commits. Ready.
LAZYCACHE=0 python3.11 -O test/external/external_test_speed_llama.py ``` Master codegen mean runtime: 119.23ms, runs: 135.19, 111.06, 138.51, 110.68, 108.25, 112.41, 111.19, 143.45, 109.42, 112.13 methodcache mean runtime: 111.93ms, runs: 105.79, 105.06, 105.42, 140.35,...
There are some minor performance tweaks included - I can remove those if you want to keep this MR cleaner.
The diff was larger because I had removed several methods that were used only once, mainly in the View init. * `filter_strides` * `is_contiguous` * `view_from_shape` This reduced the function...
``` LAZYCACHE=0 python3.11 -O test/external/external_test_speed_llama.py Master codegen mean runtime: 119.23ms, runs: 135.19, 111.06, 138.51, 110.68, 108.25, 112.41, 111.19, 143.45, 109.42, 112.13 methodcache mean runtime: 111.93ms, runs: 105.79, 105.06, 105.42, 140.35,...
@geohot This should be done. I think I can achieve a 20% speedup on LLaMA (and basically everything else) with this and similar refactors (e.g. remove the ShapeTracker class, better...
**Re: This MR** For readability, the class methods are nicest, but badly cachable (see below). However, in this case, the caches don't add that much. I have moved `idxs_to_idx` out...
Comes for free once Views are NamedTuples - they're immutable.
What is being saved? Runner time? Bandwidth? Ingress? Based on the tqdm timings, the downloads are near instant because the download speed is so high. Creating the cache, however, adds...