Andy Ehrenberg
Results
12
comments of
Andy Ehrenberg
Some of the extra GPU memory can probably be attributed to how the flax generation implements the kv cache. Check what happens when you set max new tokens to be...
Also, it doesn't make sense to run the flax stuff within a `torch.no_grad()` context.