transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add Arcee model support

Open Crystalcareai opened this issue 6 months ago • 2 comments

Summary

This PR adds support for the Arcee model architecture, laying the groundwork for the upcoming Arcee Foundation Model (AFM) release. Arcee is a decoder-only transformer model based on the Llama architecture with a key modification: it uses ReLU² (ReLU-squared) activation in the MLP blocks instead of SiLU, following recent research showing improved training efficiency with squared activations.

Model Description

Arcee is architecturally similar to Llama but with the following distinctions:

  • ReLU² activation: Uses x * relu(x) in MLP layers for improved gradient flow
  • Optimized for efficiency: Designed with training and inference efficiency in mind
  • Extended context: Supports extended context with RoPE scaling

Implementation Details

  • Modular implementation inheriting from Llama components where applicable
  • Custom ArceeMLP class implementing the ReLU² activation
  • Full support for all standard transformers features:
    • Flash Attention 2, SDPA, and other attention backends
    • Gradient checkpointing
    • Quantization support (including quantized caches)
    • All standard model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)

Testing

  • Added comprehensive test suite following standard transformers test patterns
  • Tests for all model variants and core functionality
  • Specific test for ReLU² activation verification
  • RoPE scaling tests including YARN support
  • Tested model forward and backward passes
  • Verified compatibility with existing architecture
  • Model loading and forward passes verified
  • Compatibility with existing infrastructure confirmed

Crystalcareai avatar Jun 05 '25 19:06 Crystalcareai

looks good @Crystalcareai! Feel free to ping us whenever you're ready for review. You can also resolve the code style errors with pip install -e .[quality] followed by make style or make fixup

Rocketknight1 avatar Jun 06 '25 11:06 Rocketknight1

@Rocketknight1 Hey I think I'm ready for a review, Got a lot of the tests passing though still getting some failures that don't seem to be related to my code. Let me know how best I can get this ready for merging.

Crystalcareai avatar Jun 11 '25 15:06 Crystalcareai

Hi @Cyrilvallez , Thanks for the feedback, made the requested refactoring changes. Also, while removing the init from the modular implementation as suggested, the generated modeling code does not have self.config_class = ArceeConfig from the previous version. Is that redundant as well?

pranav4501 avatar Jun 16 '25 23:06 pranav4501

Also, while removing the init from the modular implementation as suggested, the generated modeling code does not have self.config_class = ArceeConfig from the previous version. Is that redundant as well?

Yes, it's already in the PreTrainedModel!

Cyrilvallez avatar Jun 19 '25 09:06 Cyrilvallez

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Cyrilvallez Thanks for the feedback, removed the pretraining TP from the configurations and added scaffolding for generation integration testing. We will add more robust integration tests and update the checkpoints with the release.

pranav4501 avatar Jun 24 '25 03:06 pranav4501