TabPFN icon indicating copy to clipboard operation
TabPFN copied to clipboard

Create Fallback Model Download Link

Open noahho opened this issue 8 months ago • 0 comments

Describe the workflow you want to enable

HuggingFace is down at times, which means users can't install and download or models

Describe your proposed solution

We can have a fallback download link through a PriorLabs S3 bucket @Jabb0

#!/bin/bash

# Script to download all 14 TabPFN model files.

# --- Configuration ---
# Base URL where the model files are stored
BASE_URL="https://storage.googleapis.com/tabpfn-v2-model-files/05152025"

# Directory to download the files into.
# Default is the current directory (".").
# You can change this to a specific path, e.g., "tabpfn_custom_models"
DOWNLOAD_DIR="."

# List of all 14 model files to download
MODEL_FILES=(
  "tabpfn-v2-classification-gn2p4bpt.ckpt"
  "tabpfn-v2-classification-vutqq28w.ckpt"
  "tabpfn-v2-classification-znskzxi4.ckpt"
  "tabpfn-v2-classifier-gn2p4bpt.ckpt"
  "tabpfn-v2-classifier-llderlii.ckpt"
  "tabpfn-v2-classifier-od3j1g5m.ckpt"
  "tabpfn-v2-classifier-vutqq28w.ckpt"
  "tabpfn-v2-classifier-znskzxi4.ckpt"
  "tabpfn-v2-classifier.ckpt"
  "tabpfn-v2-regressor-09gpqh39.ckpt"
  "tabpfn-v2-regressor-2noar4o2.ckpt"
  "tabpfn-v2-regressor-5wof9ojf.ckpt"
  "tabpfn-v2-regressor-wyl4o83o.ckpt"
  "tabpfn-v2-regressor.ckpt"
)
# --- End Configuration ---

# Create download directory if it doesn't exist
if [ ! -d "$DOWNLOAD_DIR" ]; then
  echo "Creating download directory: $DOWNLOAD_DIR"
  mkdir -p "$DOWNLOAD_DIR"
fi

echo "Starting download for all 14 model files using curl..."
echo "Target directory: $(cd "$DOWNLOAD_DIR" && pwd)" # Show absolute path if DOWNLOAD_DIR is relative
echo "------------------------------------"

SUCCESS_COUNT=0
FAIL_COUNT=0

for FILE in "${MODEL_FILES[@]}"; do
  FILE_URL="$BASE_URL/$FILE"
  DEST_PATH="$DOWNLOAD_DIR/$FILE"

  echo "Processing $FILE..."

  # Check if file exists at URL using curl's HEAD request capabilities
  # curl -L: Follow redirects
  # curl -s: Silent mode
  # curl -f: Fail silently (no output) on HTTP errors, but return error code
  # curl -I: Show document info only (HEAD request)
  # curl -o /dev/null: Discard body
  if curl -LsfI -o /dev/null "$FILE_URL"; then
    echo "URL accessible. Downloading $FILE to $DEST_PATH..."
    curl -L -o "$DEST_PATH" "$FILE_URL"
    if [ $? -eq 0 ]; then
      echo "Successfully downloaded $FILE"
      SUCCESS_COUNT=$((SUCCESS_COUNT + 1))
    else
      echo "ERROR: Failed to download $FILE with curl (exit code $?)"
      FAIL_COUNT=$((FAIL_COUNT + 1))
    fi
  else
    echo "ERROR: File not found at $FILE_URL or URL is not accessible (checked with curl)."
    FAIL_COUNT=$((FAIL_COUNT + 1))
  fi
  echo "------------------------------------"
done

echo ""
echo "Model download process complete!"
echo "Successfully downloaded: $SUCCESS_COUNT file(s)."
echo "Failed to download: $FAIL_COUNT file(s)."

if [ "$DOWNLOAD_DIR" == "." ]; then
  echo "Files are saved in the current directory: $(pwd)"
else
  # Resolve to absolute path for clarity
  ABS_DOWNLOAD_DIR=$(cd "$DOWNLOAD_DIR"; pwd)
  echo "Files are saved in: $ABS_DOWNLOAD_DIR"
fi

How to use the script:

  1. Save the script above to a file, for example, download_all_tabpfn_models.sh.
  2. Make it executable: chmod +x download_all_tabpfn_models.sh
  3. Run the script: ./download_all_tabpfn_models.sh
    • All 14 specified .ckpt files will be downloaded into the current directory (or the directory specified in DOWNLOAD_DIR within the script).
  4. Provide the correct paths to tabpfn:
    • Specify directly: TabPFNClassifier(model_path="/path/to/model.ckpt")
    • Set environment variable: os.environ["TABPFN_MODEL_CACHE_DIR"] = "/path/to/dir"
    • Default OS cache directory:
      • Windows: %APPDATA%\tabpfn\
      • macOS: ~/Library/Caches/tabpfn/
      • Linux: ~/.cache/tabpfn/

Longer term

Change public fallback link: https://github.com/PriorLabs/TabPFN/blob/main/src/tabpfn/model/loading.py#L85

Could change around

https://github.com/PriorLabs/TabPFN/blob/main/src/tabpfn/model/loading.py#L85

def get_fallback_urls(self) -> list[str]:
    return [
        f"https://huggingface.co/{self.repo_id}/resolve/main/{filename}?download=true"
        for filename in self.filenames
    ]

Describe alternatives you've considered, if relevant

No response

Additional context

No response

Impact

None

noahho avatar Apr 14 '25 09:04 noahho