tabby icon indicating copy to clipboard operation
tabby copied to clipboard

Provide quantized versions in model library

Open aurelienizl opened this issue 5 months ago • 0 comments

Feature Request: TABBYLM Model Library with Version Access

Description

Implement a model library for TABBYLM, similar to Ollama's (https://ollama.com/library), to enhance usability across diverse hardware. Offering quantized versions of large models would enable running 7B or larger models on lower-end GPUs, expanding TABBYLM's reach to mid-range GPU users.

Proposed Functionality

  1. Model Library: Centralized repository for various LLM models and versions.

  2. Version Access: Direct access to different model versions, example:

    llama3.1-latest
    42182419e950 • 4.7GB • Updated 14 hours ago
     8b
    llama3.1-8b-latest
    42182419e950 • 4.7GB • Updated 14 hours ago
     llama3.1-70b
    c0df3564cfe8 • 40GB • Updated 14 hours ago
     llama3.1-405b
    65fa6b82bfda • 229GB • Updated 14 hours ago
    
  3. Quantization Options: Various options for different GPU capabilities:

     8b-instruct-fp16
    4aacac419454 • 16GB • Updated 14 hours ago
     8b-instruct-q2_K
    44a139eeb344 • 3.2GB • Updated 14 hours ago
     8b-instruct-q3_K_S
    16268e519444 • 3.7GB • Updated 14 hours ago
     8b-instruct-q3_K_M
    4faa21fca5a2 • 4.0GB • Updated 14 hours ago
     8b-instruct-q3_K_L
    04a2f1e44de7 • 4.3GB • Updated 14 hours ago
    

    (Additional quantization options as per Ollama's page)

Rationale

GPU Compatibility: Allows effective use of TABBYLM on various GPU capabilities.

Examples:

  • RTX 3080 (10GB): Currently unable to load 6.7B or larger models due to CudaMalloc failure.
  • RTX 4060 laptop (8GB): Limited to starcoder2:3b, while Ollama allows running 13B quantized models with performance close to fp16.

A built in feature like this avoid settings up custom models.

👍 Please react with a thumbs up if you support this feature request.

aurelienizl avatar Sep 10 '24 15:09 aurelienizl