candle Reorganize Transformers Module by Model Family

Reorganize Transformers Module by Model Family

Open DrJesseGlass opened this issue 2 weeks ago • 1 comments

Summary

The candle-transformers/src/models/ directory has grown to contain 70+ flat module entries, mixing full and quantized implementations of the same model families. This makes the codebase harder to navigate and maintain.

Proposal: Group related models into family subdirectories, similar to the pattern demonstrated in SmolLM3 (#3180).

Current State

The models/mod.rs currently has a flat structure:

pub mod llama;
pub mod llama2_c;
pub mod llama2_c_weights;
pub mod quantized_llama;
pub mod quantized_llama2_c;
pub mod mistral;
pub mod quantized_mistral;
pub mod mixtral;
pub mod phi;
pub mod phi3;
pub mod quantized_phi;
pub mod quantized_phi3;
pub mod qwen2;
pub mod qwen2_moe;
pub mod qwen3;
pub mod qwen3_moe;
pub mod qwen3_vl;
pub mod quantized_qwen2;
pub mod quantized_qwen3;
// ... 50+ more entries

Problems:

70+ flat modules in a single directory
Full and quantized versions scattered
No clear model family grouping
Harder to navigate and discover related implementations
Difficult to see which models have quantized versions

Proposed Structure

Group models by family in subdirectories, similar to SmolLM3 (#3180):

models/
├── llama/
│   ├── mod.rs              # Re-exports for backward compatibility
│   ├── llama.rs            # Full precision
│   ├── llama2_c.rs         # Llama2.c variant
│   ├── quantized_llama.rs
│   └── quantized_llama2_c.rs
├── mistral/
│   ├── mod.rs
│   ├── mistral.rs
│   ├── mixtral.rs
│   └── quantized_mistral.rs
├── phi/
│   ├── mod.rs
│   ├── phi.rs
│   ├── phi3.rs
│   ├── quantized_phi.rs
│   └── quantized_phi3.rs
├── qwen/
│   ├── mod.rs
│   ├── qwen2.rs
│   ├── qwen2_moe.rs
│   ├── qwen3.rs
│   ├── qwen3_moe.rs
│   ├── qwen3_vl.rs
│   ├── quantized_qwen2.rs
│   └── quantized_qwen3.rs
├── smol/                   # Already implemented in #3180
│   ├── mod.rs
│   ├── smollm3.rs
│   └── quantized_smollm3.rs
└── ... other families

Benefits

Better Organization

Related implementations grouped together
Easy to see all variants of a model family
Clear separation between families
Easier to navigate codebase

Better Discoverability

Users can find all Llama variants in one place
Clear which models have quantized versions
Easier to compare implementations within family
Better for documentation generation

Backward Compatibility

Re-export from module for existing imports
No breaking changes for users
Can migrate incrementally

Backward Compatibility Strategy

The reorganization maintains backward compatibility through re-exports. Using the Llama family as an example:

New Directory Structure

models/llama/
├── mod.rs
├── llama.rs
├── quantized_llama.rs
└── llama2_c.rs

Re-export Pattern

In models/llama/mod.rs:

// Declare submodules
pub mod llama;
pub mod quantized_llama;
pub mod llama2_c;

// Optional: re-export everything for convenience
pub use llama::*;
pub use quantized_llama::*;
pub use llama2_c::*;

In models/mod.rs:

// New: expose the family module
pub mod llama;

// For backward compatibility: re-export submodules at top level
pub use llama::llama;
pub use llama::quantized_llama;
pub use llama::llama2_c;

Three Import Patterns (All Work!)

Pattern 1: Legacy (backward compatible)

use candle_transformers::models::llama;              // Old way still works!
use candle_transformers::models::quantized_llama;    // Old way still works!

Pattern 2: New nested (explicit)

use candle_transformers::models::llama::llama;       // New explicit way
use candle_transformers::models::llama::quantized_llama;

Pattern 3: Import whole family

use candle_transformers::models::llama::*;           // Import entire family

SmolLM3 Example

SmolLM3 (#3180) demonstrates this pattern:

Structure:

models/smol/
├── mod.rs
├── smollm3.rs
└── quantized_smollm3.rs

Current models/smol/mod.rs:

pub mod smollm3;
pub mod quantized_smollm3;

In models/mod.rs:

pub mod smol;

Migration Decision

Suggested Model Families

Based on the current modules, these natural groupings exist:

Core LLM Families:

llama/ - llama, llama2_c, quantized variants
mistral/ - mistral, mixtral, quantized_mistral
phi/ - phi, phi3, quantized variants
qwen/ - qwen2, qwen3, MoE variants, VL, quantized versions
gemma/ - quantized_gemma3, quantized_recurrent_gemma, paligemma
mpt/ - mpt, quantized_mpt
stablelm/ - quantized_stable_lm (if more variants added)
t5/ - t5, quantized_t5
olmo/ - olmo, olmo2

Vision/Multimodal:

llava/ - llava variants
blip/ - blip, quantized_blip, quantized_blip_text
clip/ - openclip, mobileclip
moondream/ - moondream, quantized_moondream
pixtral/ - pixtral variants

Specialized Architectures:

mamba/ - mamba variants
rwkv/ - quantized_rwkv_v5, quantized_rwkv_v6
mimi/ - mimi variants

Audio/Speech:

parler_tts/ - parler_tts variants
metavoice/ - metavoice, quantized_metavoice

Keep Standalone (for now):

Single-model families or unique architectures that don't fit groups

References

SmolLM3 PR: #3180 (demonstrates pattern)

Nov 12 '25 21:11 DrJesseGlass

candle candle copied to clipboard

Reorganize Transformers Module by Model Family

Summary

Current State

Proposed Structure

Benefits

Better Organization

Better Discoverability

Backward Compatibility

Backward Compatibility Strategy

New Directory Structure

Re-export Pattern

Three Import Patterns (All Work!)

SmolLM3 Example

Migration Decision

Suggested Model Families

References

candle
candle copied to clipboard