segment-anything icon indicating copy to clipboard operation
segment-anything copied to clipboard

Multimodal input substitution for RGB

Open 25benjaminli opened this issue 1 year ago • 1 comments

Hi, I was wondering if it would be feasible to substitute the traditional RGB image style for three different medical modalities during SAM training and inference. For example, instead of having red, green, and blue channels, have three slices of data, each representing a different medical imaging modality (each is 224x224).

Would SAM reasonably be able to learn how to use information from each of these slices without too much fine tuning?

25benjaminli avatar Feb 05 '24 13:02 25benjaminli

Hey, As far as I can comprehend while directly replacing RGB channels with three medical modalities in SAM training and inference is technically possible it might not be the most efficient or effective approach because each modality has different value ranges and interpretations alternatively training separate models for each modality and then fusing before classification would allow efficient feature extraction. As far as fine tuning is concerned it is likely to do some fine tuning to get better results.

imcoza avatar Feb 07 '24 14:02 imcoza