bigbird
bigbird copied to clipboard
I've added bigbird's attention to my model, but not seeing a decrease in memory
I've replaced the attention layers in Enformer with those in bigbird, but the memory usage calculated by tf.get_memory_info shows the usage is still basically the same (within 1%). I'm wondering if I need to include code from the encoder or decoder to see a decrease in memory usage?
Thanks!
To clarify, you are using https://github.com/google-research/bigbird/blob/5f2a5aa7fbab23e32e0e0b41c5f0192f0c023e05/bigbird/core/attention.py#L637 with attention_type = 'block_sparse'
?
What's your sequence length ?
Correct, I'm using that class with block_sparse
attention.
When the sequence enters the attention layer, its length is 1536.
I see. Does the memory used change with sequence length?
I don't suppose your are using XLA? BigBird can be as much as 30% faster with tf.function(jit_compile=True)
. It also produces better memory profiles that make it easier to debug.
Yes, the memory used increases with sequence length.
I'm not using XLA, and thanks for the tip!
https://www.tensorflow.org/guide/profiler#memory_profile_tool may also be useful. The XLA memory viewer (https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm#memory_viewer) is better but both are useful.