Phil Wang
Phil Wang
memory-compressed-attention
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
omninet-pytorch
Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch
memory-transformer-xl
A variant of Transformer-XL where the memory is updated not with a queue, but with attention
panoptic-transformer
Another attempt at a long-context / efficient transformer by me
memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
tranception-pytorch
Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction
local-attention-flax
Local Attention - Flax module for Jax
genetic-algorithm-pytorch
Toy genetic algorithm in Pytorch
speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
zorro-pytorch
Implementation of Zorro, Masked Multimodal Transformer, in Pytorch