candle icon indicating copy to clipboard operation
candle copied to clipboard

Reorganize Transformers Module by Model Family

Open DrJesseGlass opened this issue 2 weeks ago • 1 comments

Summary

The candle-transformers/src/models/ directory has grown to contain 70+ flat module entries, mixing full and quantized implementations of the same model families. This makes the codebase harder to navigate and maintain.

Proposal: Group related models into family subdirectories, similar to the pattern demonstrated in SmolLM3 (#3180).

Current State

The models/mod.rs currently has a flat structure:

pub mod llama;
pub mod llama2_c;
pub mod llama2_c_weights;
pub mod quantized_llama;
pub mod quantized_llama2_c;
pub mod mistral;
pub mod quantized_mistral;
pub mod mixtral;
pub mod phi;
pub mod phi3;
pub mod quantized_phi;
pub mod quantized_phi3;
pub mod qwen2;
pub mod qwen2_moe;
pub mod qwen3;
pub mod qwen3_moe;
pub mod qwen3_vl;
pub mod quantized_qwen2;
pub mod quantized_qwen3;
// ... 50+ more entries

Problems:

  • 70+ flat modules in a single directory
  • Full and quantized versions scattered
  • No clear model family grouping
  • Harder to navigate and discover related implementations
  • Difficult to see which models have quantized versions

Proposed Structure

Group models by family in subdirectories, similar to SmolLM3 (#3180):

models/
├── llama/
│   ├── mod.rs              # Re-exports for backward compatibility
│   ├── llama.rs            # Full precision
│   ├── llama2_c.rs         # Llama2.c variant
│   ├── quantized_llama.rs
│   └── quantized_llama2_c.rs
├── mistral/
│   ├── mod.rs
│   ├── mistral.rs
│   ├── mixtral.rs
│   └── quantized_mistral.rs
├── phi/
│   ├── mod.rs
│   ├── phi.rs
│   ├── phi3.rs
│   ├── quantized_phi.rs
│   └── quantized_phi3.rs
├── qwen/
│   ├── mod.rs
│   ├── qwen2.rs
│   ├── qwen2_moe.rs
│   ├── qwen3.rs
│   ├── qwen3_moe.rs
│   ├── qwen3_vl.rs
│   ├── quantized_qwen2.rs
│   └── quantized_qwen3.rs
├── smol/                   # Already implemented in #3180
│   ├── mod.rs
│   ├── smollm3.rs
│   └── quantized_smollm3.rs
└── ... other families

Benefits

Better Organization

  • Related implementations grouped together
  • Easy to see all variants of a model family
  • Clear separation between families
  • Easier to navigate codebase

Better Discoverability

  • Users can find all Llama variants in one place
  • Clear which models have quantized versions
  • Easier to compare implementations within family
  • Better for documentation generation

Backward Compatibility

  • Re-export from module for existing imports
  • No breaking changes for users
  • Can migrate incrementally

Backward Compatibility Strategy

The reorganization maintains backward compatibility through re-exports. Using the Llama family as an example:

New Directory Structure

models/llama/
├── mod.rs
├── llama.rs
├── quantized_llama.rs
└── llama2_c.rs

Re-export Pattern

In models/llama/mod.rs:

// Declare submodules
pub mod llama;
pub mod quantized_llama;
pub mod llama2_c;

// Optional: re-export everything for convenience
pub use llama::*;
pub use quantized_llama::*;
pub use llama2_c::*;

In models/mod.rs:

// New: expose the family module
pub mod llama;

// For backward compatibility: re-export submodules at top level
pub use llama::llama;
pub use llama::quantized_llama;
pub use llama::llama2_c;

Three Import Patterns (All Work!)

Pattern 1: Legacy (backward compatible)

use candle_transformers::models::llama;              // Old way still works!
use candle_transformers::models::quantized_llama;    // Old way still works!

Pattern 2: New nested (explicit)

use candle_transformers::models::llama::llama;       // New explicit way
use candle_transformers::models::llama::quantized_llama;

Pattern 3: Import whole family

use candle_transformers::models::llama::*;           // Import entire family

SmolLM3 Example

SmolLM3 (#3180) demonstrates this pattern:

Structure:

models/smol/
├── mod.rs
├── smollm3.rs
└── quantized_smollm3.rs

Current models/smol/mod.rs:

pub mod smollm3;
pub mod quantized_smollm3;

In models/mod.rs:

pub mod smol;

Migration Decision

Suggested Model Families

Based on the current modules, these natural groupings exist:

Core LLM Families:

  • llama/ - llama, llama2_c, quantized variants
  • mistral/ - mistral, mixtral, quantized_mistral
  • phi/ - phi, phi3, quantized variants
  • qwen/ - qwen2, qwen3, MoE variants, VL, quantized versions
  • gemma/ - quantized_gemma3, quantized_recurrent_gemma, paligemma
  • mpt/ - mpt, quantized_mpt
  • stablelm/ - quantized_stable_lm (if more variants added)
  • t5/ - t5, quantized_t5
  • olmo/ - olmo, olmo2

Vision/Multimodal:

  • llava/ - llava variants
  • blip/ - blip, quantized_blip, quantized_blip_text
  • clip/ - openclip, mobileclip
  • moondream/ - moondream, quantized_moondream
  • pixtral/ - pixtral variants

Specialized Architectures:

  • mamba/ - mamba variants
  • rwkv/ - quantized_rwkv_v5, quantized_rwkv_v6
  • mimi/ - mimi variants

Audio/Speech:

  • parler_tts/ - parler_tts variants
  • metavoice/ - metavoice, quantized_metavoice

Keep Standalone (for now):

  • Single-model families or unique architectures that don't fit groups

References

  • SmolLM3 PR: #3180 (demonstrates pattern)

DrJesseGlass avatar Nov 12 '25 21:11 DrJesseGlass