Nikos Karampatziakis
Nikos Karampatziakis
To facilitate comparison with a method we are developing, is it possible to release raw results (e.g. similar to [dopamine json files](https://github.com/google/dopamine/tree/master/baselines/data)?) These data already "exist" as part of your...
I am working on my first triton kernel and I am running into the following error at runtime. ``` File "/opt/conda/lib/python3.10/site-packages/triton/runtime/jit.py", line 161, in return lambda *args, **kwargs: self.run(grid=grid, warmup=False,...
### Feature request I would like to contribute a KV cache implementation that only keeps a couple of layers on the GPU: the current layer in the forward pass as...
# What does this PR do? Fixes #30704 This PR introduces OffloadedCache. This is a KV cache implementation that reduces GPU memory usage in exchange for more CPU memory usage...