David Currie
David Currie
Correct, I'm using that class with `block_sparse` attention. When the sequence enters the attention layer, its length is 1536.
Yes, the memory used increases with sequence length. I'm not using XLA, and thanks for the tip!
You should be able to do that if your use the .ipynb file. In the final box, you can test a user given input.
To do something like that you would need to change the structure of the notebook. It could be worth writing the final box to a function, then in a different...