swift-models
swift-models copied to clipboard
GPT2 and BERT alignment
Refactor the GPT-2 and BERT models to make it clear which concepts are shared and which are distinct. For example, transformers and multi-head attention are shared concepts, but naming collisions have resulted in files such as TransformerBERT.swift and structs such as MultiHeadAttentionGPT2. As our collection of transformer-based language models increases, this becomes important for code reuse and maintenance.
Investigate alignment with HuggingFace APIs, which do a great job of unifying a wide variety of transformer-based models.