SegFormer3D Non-cubic data preprocessing

Hi,

Thanks for sharing your research and the model weights for future research!

I noticed that the model input has to have a cubic shape (e.g. 128x128x128) as was the case for the BraTs experiment. Could you please share how you can deal with non-cubic data? For example having a look at the ACDC dataset I noticed that in that case the data is not cubic. In other words the number of slices is much less than the height and width. How did you preprocess this type of data to be compatible with your model architecture?

Looking forward to your response!

Aug 28 '25 10:08 maberrospi

Thanks for using our repo! We’re currently developing on the dev branch of this repository. The code there supports variable-sized cubes, but you’ll still need to tune certain hyperparameters (e.g., kernel size) to best fit your problem.

Our main goal is to publish the finalized model on a Hugging Face model card. I’ll keep this issue open until the model implementation is released. In the meantime, please don’t hesitate to reach out if you have any questions.

Aug 28 '25 20:08 bnavard

Thanks for the prompt reply! That is good to hear and again thank you for your efforts! Looking forward to that release.

You mention this will support variable-sized cubes. My question mostly relates to how you deal with non-cubic shaped inputs. For example in the ACDC dataset which you report in your paper has a smaller Depth channel compared to Height and Width. If possible I would like to know what approach you used to preprocess this data. I could think of interpolation along the Depth, padding or downscaling Height and Width but I believe all these methods have some tradeoffs especially when it comes to medical images.

Looking forward to your input!

Aug 29 '25 05:08 maberrospi

I see. The short answer is that we used the same augmentation pipeline as nnFormer for data processing, which we’ve also noted in our repository. The longer answer has two parts. First, to ensure a fair comparison against baseline models such as nnFormer, we adopted their training pipeline in order to take advantage of the various data processing and augmentation strategies it provides. For example, their repository shows that they applied multiple augmentation techniques to expand the sample size of the ACDC dataset. Second, the nnFormer codebase is structured in a way that makes it difficult to separate the augmentation code, since it is deeply nested. As a result, we were unable to use their augmentation as a standalone module in our codebase.

TL;DR: Please run your model using the nnFormer codebase.

Aug 29 '25 13:08 bnavard

Thank for this information! I must have missed the mention of nnFormer for data processing in the repo. Also, thank you for the further explanation! I will have a look at the nnFormer codebase.

Sep 01 '25 05:09 maberrospi

@bnavard To address this issue, I think you mainly relied on augmentation or preprocessing methods. I’ve noticed similar functionality in MONAI’s transformation pipeline, where interpolation is often used under the hood in dataloader. Instead of depending on augmentation, another approach could be to design the model architecture itself to support arbitrary input shapes. Have you explored this direction?

At the moment, I’m implementing this model in Keras 3 (supporting both 2D and 3D), similar to the official code, and plan to add it to a repo (medic-ai) I’m building — a MONAI-like medical imaging toolkit fully written in Keras 3 with support for all backends. I haven’t tested the arbitrary input shape option yet, but I believe knowing your thoughts on it can provide some useful intuition moving forward. Thanks.

Sep 06 '25 21:09 innat