Philip Pham comments

Results 6 comments of


                                            Philip Pham

I've added bigbird's attention to my model, but not seeing a decrease in memory

To clarify, you are using https://github.com/google-research/bigbird/blob/5f2a5aa7fbab23e32e0e0b41c5f0192f0c023e05/bigbird/core/attention.py#L637 with `attention_type = 'block_sparse'` ? What's your sequence length ?

I've added bigbird's attention to my model, but not seeing a decrease in memory

I see. Does the memory used change with sequence length? I don't suppose your are using XLA? BigBird can be as much as 30% faster with `tf.function(jit_compile=True)`. It also produces...

I've added bigbird's attention to my model, but not seeing a decrease in memory

https://www.tensorflow.org/guide/profiler#memory_profile_tool may also be useful. The XLA memory viewer (https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm#memory_viewer) is better but both are useful.

Is it possible to include instructions on how to run it on GPUs

A single Titan X doesn't have enough HBM. For our GPU setup, we had 8 V100s for a total of 128GB of HBM. For a single Titan X, I think...

Show type hints and argument names.

I, too, would like to see type hints and argument names, but my preferred implementation would be different. I'd prefer something less intrusive in the minibuffer. I think what tide...

Partition a Pallas Kernel as its pure Jax Counterpart

Maybe not quite the same thing, but similar in spirit, it would be nice if the pallas_call could inherit replication rules for use with shard_map, so we don't have to...