pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

[FEATURE] 3D model support for volumetric / medical imaging in timm

Open CowboyH opened this issue 1 week ago • 3 comments

Is your feature request related to a problem? Please describe. I’m working on 3D volumetric data (medical OCT volumes) and would like to use timm as a unified model zoo across my projects. However, as far as I can tell from the documentation and model list, timm currently focuses on 2D image models and I don’t see any officially supported 3D CNN backbones or APIs for volumetric data (e.g. tensors shaped like [B, C, D, H, W] or [B, C, T, H, W]).

This leaves a gap for users who would like to use timm-style APIs and architectures (ResNet, ConvNeXt, EfficientNet, ViT, etc.) directly on 3D data (volumes or videos) instead of only 2D images. If I’m missing existing 3D support, I would really appreciate clarification on the current status and recommended usage patterns

Describe the solution you'd like There are two closely related things I’d like to ask for:

Clarification of current 3D support

Is there any existing or experimental 3D model support in timm (e.g. 3D variants of ResNet/ConvNeXt, or a recommended way to use timm backbones for 3D inputs)?

If yes, could this be documented more explicitly (which models, expected input shapes, example usage)?

Official 3D model support (if it does not exist yet)

Provide a small but representative set of 3D backbones with the same API style as timm, for example:

3D ResNet family (e.g. R3D-18 style),

3D ConvNeXt / EfficientNet style backbones,

Possibly a generic “dimension-agnostic” implementation where spatial_dims=2/3 could be chosen.

Add helper functions or flags, e.g. create_model(model_name, spatial_dims=3, in_chans=1, num_classes=...) or clearly separate 2D vs 3D model names (e.g. "resnet50_3d", "convnext_tiny_3d").

Include at least one simple example in the docs or notebooks showing how to train a 3D classifier on [B, C, D, H, W] inputs.

This would allow users working on 3D medical imaging, video classification, and other volumetric tasks to rely on timm as a single, consistent model zoo.

Describe alternatives you've considered Using 2D timm models on slices/frames: Treat the depth/time dimension as extra slices, feed each slice/frame into a 2D timm backbone, and then aggregate features. This works but:

Loses native 3D spatial modeling,

Requires custom aggregation code and is less elegant than having proper 3D convolutional backbones,

Makes it harder to reuse pre-defined training scripts or configs.

Using external libraries/frameworks for 3D models (e.g. medical imaging frameworks or community 3D forks built “on top of timm”):

These often have their own APIs and may not stay fully in sync with the latest timm architectures and weights.

It fragments the workflow: 2D tasks use timm, 3D tasks use a separate ecosystem, instead of a unified interface.

Because timm has become a de facto standard for image backbones in PyTorch, having first-class 3D support (or at least a clearly documented position on it) would be extremely valuable.

Additional context

CowboyH avatar Dec 10 '25 08:12 CowboyH

@CowboyH This implementation from ZFTurbo is closer to what you are looking for - although that is isotropic and does not account for 3D data that is heavily anisotropic (in-plane resolution is much higher), which is common in many modalities in medical imaging.

Pradecki0 avatar Dec 10 '25 22:12 Pradecki0

@Pradecki0 Thank you, I’m already using this open-source library.

CowboyH avatar Dec 11 '25 02:12 CowboyH

I've been thinking about directions to push timm, for quite a while actually. Finding a direction with an in need user base that's a sensible jump from here and somewhat future proof isn't trivial to find.

3D is definitely one of the thoughts, video also. Though you dig in there and there are so many sub-tasks and variants it becomes challenging to find a starting point. Also there's domain knowledge that helps guide and in 3D I feel I could use some pointers / guidance re what specific applications are underserved and in need.

Case in point isotropic vs anistropic. What does that entail in terms of objects, data handling, modellling specifics? Are there enough reference datasets for me to operate with, test models, pretrain models, etc...

Open to some data dumps here, pointeers / references. Links to other requests from people that may be out there in other repos ....

EDIT even for video, there are many approaches, it'd be hard to tackle all. '3D' is one approach to video, but seems to have fallen to the side relative to more specific transformer approaches that handle the T axis more specifically than the spatial ones...

rwightman avatar Dec 11 '25 15:12 rwightman

@rwightman Hi, Happy to help if you decide to move timm toward 3D / medical-imaging use cases — I’ve worked extensively with ophthalmology and brain volumetric data, and can share practical pointers (what applications feel underserved, common data handling gotchas, and what tends to work in training).

If useful, I can also provide example data preprocessing notes around spacing/geometry, since isotropic vs anisotropic spacing is often a key consideration in medical volumes and it can affect preprocessing choices (e.g., resampling vs native spacing), augmentation, patch sampling strategy, and some modeling details.

Public 3D/volumetric datasets that might be useful (ophthalmology + brain + other body parts): (Note: I haven’t personally used every dataset listed below; some are sourced from papers/blog posts, so there may be inaccuracies.)

Ophthalmology (OCT / OCTA volumes)

  • OCTA-500 (paired 3D OCT & OCTA volumes, plus projections/labels): https://ieee-dataport.org/open-access/octa-500
  • Duke OCT datasets (OCT volumes used in multiple retinal OCT works): https://people.duke.edu/~sf59/software.html (see individual dataset pages linked there)
  • RETOUCH challenge (OCT volumes with fluid annotations): https://retouch.grand-challenge.org/
  • CAVRI datasets (annotated 3D SD-OCT scans; access via request form): https://dsp.put.poznan.pl/cavri_database-191/

Brain (MRI volumes)

  • BraTS (brain tumor MRI, multi-modal): https://braintumorsegmentation.org/ (recent editions are often distributed via Synapse)
  • OASIS (open brain MRI datasets): https://sites.wustl.edu/oasisbrains/
  • ADNI (Alzheimer’s neuroimaging; requires registration): https://adni.loni.usc.edu/
  • IXI (healthy brain MRI, ~600 subjects): https://brain-development.org/ixi-dataset/
  • OpenNeuro (many public BIDS MRI datasets): https://openneuro.org/

Other 3D medical imaging benchmarks

  • Medical Segmentation Decathlon (10 heterogeneous 3D tasks): https://medicaldecathlon.com/
  • KiTS21 (kidney/kidney tumor CT segmentation): https://kits-challenge.org/kits21/
  • LUNA16 (lung nodule CT): https://luna16.grand-challenge.org/
  • LiTS (liver tumor CT segmentation): https://competitions.codalab.org/competitions/17094
  • TCIA collections (large public cancer imaging archive): https://www.cancerimagingarchive.net/browse-collections/

Other libraries/frameworks that already “collect” 3D models (beyond timm-3d), in case you want references or interoperability targets:

  • MONAI (3D networks + model zoo): https://monai.io/ and model zoo: https://monai.org.cn/model-zoo.html
  • nnU-Net (strong 3D segmentation baseline framework): https://github.com/MIC-DKFZ/nnUNet
  • MedicalNet (3D-ResNet pretraining for medical volumes): https://github.com/Tencent/MedicalNet
  • TorchIO (3D medical image IO/augmentation/sampling utilities): https://github.com/TorchIO-project/torchio
  • For “video 3D conv” references: torchvision video models + PyTorchVideo model zoo: https://docs.pytorch.org/vision/main/models.html https://pytorchvideo.readthedocs.io/en/latest/model_zoo.html

If you share what direction you’re leaning toward (segmentation vs classification, CT/MRI vs OCT/OCTA), I can suggest a compact starter set of tasks + datasets without exploding scope.

CowboyH avatar Dec 16 '25 01:12 CowboyH