segment-anything
segment-anything copied to clipboard
Multimodal input substitution for RGB
Hi, I was wondering if it would be feasible to substitute the traditional RGB image style for three different medical modalities during SAM training and inference. For example, instead of having red, green, and blue channels, have three slices of data, each representing a different medical imaging modality (each is 224x224).
Would SAM reasonably be able to learn how to use information from each of these slices without too much fine tuning?
Hey, As far as I can comprehend while directly replacing RGB channels with three medical modalities in SAM training and inference is technically possible it might not be the most efficient or effective approach because each modality has different value ranges and interpretations alternatively training separate models for each modality and then fusing before classification would allow efficient feature extraction. As far as fine tuning is concerned it is likely to do some fine tuning to get better results.