SimpleTuner icon indicating copy to clipboard operation
SimpleTuner copied to clipboard

support running via Cog

Open bghira opened this issue 1 month ago • 2 comments

This pull request adds Cog support for training with SimpleTuner by introducing a new Predictor entrypoint, a helper orchestration layer, and configuration files. These changes enable users to launch training jobs through Cog, handling dataset staging, configuration, and output packaging automatically.

The most important changes are:

Cog integration and configuration:

  • Added a new cog.yaml file with build settings (CUDA version, Python version, system packages) and specified the predictor entrypoint as predict.py:Predictor.
  • Created a comprehensive .dockerignore to exclude unnecessary files from the Docker build context, improving build performance and security.

Predictor and orchestration logic:

  • Added predict.py, which defines a Predictor class for Cog, handling user inputs, launching SimpleTuner training jobs, and returning zipped output directories.
  • Introduced simpletuner/cog.py, a utility module that stages datasets, prepares config files, runs training, and packages results for Cog, including safe archive extraction and Hugging Face token handling.

bghira avatar Nov 26 '25 22:11 bghira

Could we push the trained model checkpoint or model to Replicate?

cc: @bghira

ParagEkbote avatar Nov 28 '25 07:11 ParagEkbote

@ParagEkbote see this doc

bghira avatar Nov 28 '25 15:11 bghira