kserve Add CatBoost Model Serving support

…nfigs

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #

Type of changes Please delete options that are not relevant.

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced. Please also list any relevant details for your test configuration.

[ ] Test A
[ ] Test B
Logs

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

[ ] Have you added unit/e2e tests that prove your fix is effective or that this feature works?
[ ] Has code been commented, particularly in hard-to-understand areas?
[ ] Have you made corresponding changes to the documentation?

Release note:

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Sep 06 '25 14:09 chethanuk

Aren't these two related https://github.com/kserve/kserve/pull/4603 ?

Sep 16 '25 15:09 spolti

Aren't these two related #4603 ?

Yes both are trying to solve same thing

Sep 16 '25 21:09 chethanuk

Hi kittywaresz, can you please work together with @chethanuk seems both PRs are related?

Sep 22 '25 17:09 spolti

@spolti I will try, but I need to know what the desired approach is, do we really need a separate runtime for CatBoost? Or it will be enough to just add CatBoost backend implementation to MlServer runtime?

If we need a separate runtime, it makes no sense to keep my contribution because @chethanuk implementation covers that

If we don't need a separate runtime, @chethanuk can contribute to my branch with CatBoost runtime tests adapted for MlServer runtime

Sep 22 '25 19:09 kittywaresz

We should have a Separate CatBoost Runtime and internally do the same Mainly Since CatBoost use OpenMP parallelism - don't want MLServer's CPU-based scheduling misplace CatBoost models on already-full or nearly occupied GPU pods Also NOTE: CatBoost manages threading internally without exposing nthread parameters

BUT in detailed:

File Format Handling Architecture Mismatch Critical Difference:

SKlearn: MODEL_EXTENSIONS = (".joblib", ".pkl", ".pickle") + strict single-file policy (sklearnserver/model.py:47-51)
XGBoost: BOOSTER_FILE_EXTENSIONS = (".bst", ".json", ".ubj") + fails on multiple files (xgbserver/model.py:52-56)
CatBoost: MODEL_EXTENSIONS = (".cbm", ".bin") + intelligent preference logic (catboostserver/model.py:49-54)

Multi-Model File Discovery Logic Incompatibility

  # SKlearn/XGBoost/LightGBM pattern:
  elif len(model_files) > 1:
      raise RuntimeError("More than one model file detected")

  # CatBoost pattern:
  if len(model_files) > 1:
      cbm_files = [f for f in model_files if f.endswith(".cbm")]
      if cbm_files:
          model_files = [cbm_files[0]]  # Prefer .cbm over .bin

I think adding CatBoost's flexible file handling to MLServer would break the strict single-file expectation of other frameworks

CatBoost-Specific Container Resource and Security Requirements:

Dedicated memory allocation patterns for categorical feature encoding
Specific security context optimisations for OpenMP threading
Framework-specific volume mount requirements different from HuggingFace's /dev/shm needs

Would like to support Protocol Version Support Matrix [even v1]

MLServer: Only supports v2 protocol (kserve-mlserver.yaml:48)
CatBoost Runtime: Supports both v1 and v2 (kserve-catboostserver.yaml:14-16) Using MLServer would force CatBoost users to only use v2 protocol, breaking backward compatibility with our existing v1 deployments - and ideally should support both v1 and v2..

Threading Model: CatBoost uses OpenMP threading with specific CPU affinity requirements, while MLServer isa Generic threading model optimised for multiple framework coexistence
For performance tuning or customisation, it's best to have a separate runtime so CatBoost models can benefit from specific loading strategies for categorical features and Memory pre-allocation patterns optimised for boosting tree structures
Every major ML framework (sklearn, xgboost, lightgbm, huggingface, tensorflow, pytorch) has its own dedicated runtime, and we should follow the same for Catboost

Sep 23 '25 22:09 chethanuk

@chethanuk thank you for the detailed explanation! In that case I will close https://github.com/kserve/kserve/pull/4603

If it's required, I could help you with the documentation. I have already created https://github.com/kserve/website/pull/526 for my changes, but I can adapt it to the changes in your PR.

Sep 29 '25 11:09 kittywaresz