cmathw
cmathw
# Description Resolves #378 by adding support for `torch.nn.functional.scaled_dot_product_attention` as found [here](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html). This implementation includes FlashAttention-2, as well as, two other alternative (potentially faster) attention implementations. PyTorch attempts to automatically...
In `codebook_features/models.py`, I can see a method for attaching codebooks to each attention block's query, key and value vectors: https://github.com/taufeeque9/codebook-features/blob/a37ea8fe7d4d39298aaea042a078d09401396edc/codebook_features/models.py#L1439C1-L1460C36 After training a model with these codebooks attached though, it...