Enrico Shippole
Enrico Shippole
CUDA DEVICE 2 `CUDA_VISIBLE_DEVICES=2 python benchmark.py --only-forwards` float32 batch: 4 heads: 8 dim 64 ------------------------------------------------------------ seq_len: 128 slower: 1.01x kernel: 0.25ms baseline: 0.24ms seq_len: 256 slower: 1.70x kernel: 0.50ms baseline:...
CUDA DEVICE 3 `CUDA_VISIBLE_DEVICES=3 python benchmark.py --only-forwards` float32 batch: 4 heads: 8 dim 64 ------------------------------------------------------------ seq_len: 128 slower: 1.00x kernel: 0.25ms baseline: 0.25ms seq_len: 256 slower: 1.22x kernel: 0.36ms baseline:...
CUDA DEVICE 4 `CUDA_VISIBLE_DEVICES=4 python benchmark.py --only-forwards` float32 batch: 4 heads: 8 dim 64 ------------------------------------------------------------ seq_len: 128 slower: 1.47x kernel: 0.36ms baseline: 0.25ms seq_len: 256 slower: 1.21x kernel: 0.36ms baseline:...
CUDA DEVICE 5 `CUDA_VISIBLE_DEVICES=5 python benchmark.py --only-forwards` float32 batch: 4 heads: 8 dim 64 ------------------------------------------------------------ seq_len: 128 slower: 0.96x kernel: 0.24ms baseline: 0.25ms seq_len: 256 slower: 1.17x kernel: 0.34ms baseline:...
CUDA DEVICE 6 `CUDA_VISIBLE_DEVICES=6 python benchmark.py --only-forwards` float32 batch: 4 heads: 8 dim 64 ------------------------------------------------------------ seq_len: 128 slower: 1.01x kernel: 0.24ms baseline: 0.24ms seq_len: 256 slower: 1.69x kernel: 0.49ms baseline:...
CUDA DEVICE 7 `CUDA_VISIBLE_DEVICES=7 python benchmark.py --only-forwards` float32 batch: 4 heads: 8 dim 64 ------------------------------------------------------------ seq_len: 128 slower: 2.33x kernel: 0.57ms baseline: 0.25ms seq_len: 256 slower: 1.26x kernel: 0.37ms baseline:...
@lucidrains The benchmarks for 8 different A100 (80 GB) devices are listed above. I made sure I tried a different host and each GPU was idle and no memory was...
@qw1319 I will take a look at your suggestion as I was unable to resolve this issue previously.
> Hey @conceptofmind! #2367 will move the Community Examples to `docs/examples.rst` so they are visible in the new [Example](https://flax.readthedocs.io/en/latest/examples.html) section on the documentation, this should be ported there (depending on...
@atemaguer We have a group on the discord currently working on adding DSP to LangChain: https://discord.com/channels/1038097195422978059/1068175360648286248