Medical-SAM2 icon indicating copy to clipboard operation
Medical-SAM2 copied to clipboard

[3D Medical Images as Video] Code Implementation Differs from Paper for 3D CT Processing

Open tal-grossman opened this issue 10 months ago • 1 comments

Hi,
First of all, thank you for the great and thorough work!

While running and examining the code, I noticed some differences between the implementation and what is described in the paper regarding 3D CT dataset handling.

Observed Differences

1. Self-Sorting Memory Bank Usage in 3D

  • The paper emphasizes that the self-sorting memory bank (including resampling) is applied to 3D medical scans .
  • However, in the actual training (train_3d.py) and function files (function.py), it seems that the out-of-the-box SAM2 video propagation is used instead, where a fraction of frames are prompted rather than full memory-based embedding selection.

2. Multi-Axis Processing of 3D Volumes

  • The paper states that 3D volumes should be processed along six orientations (axial, coronal, sagittal + reverse for each) .
  • Yet, when running the BTCV dataset example, the implementation does not appear to incorporate this multi-directional approach explicitly. Instead, it runs inference similarly to SAM2’s standard frame-wise propagation.

Questions

  1. Am I misunderstanding how the self-sorting memory bank is applied in the 3D inference pipeline?
  2. If the current BTCV example runs inference more like standard video propagation, would processing the 3D dataset using the memory bank methodology (as done for 2D images) yield better results?

Would love to hear your insights. Thanks again for your excellent work!


[^1]: MedSAM-2 Paper (2408.00874v2.pdf)
[^2]: train_3d.py Implementation
[^3]: function.py Implementation

tal-grossman avatar Feb 19 '25 15:02 tal-grossman

@tal-grossman, I also came across your "1. Self-Sorting Memory Bank Usage in 3D" question while going through the paper and code. I had the same confusion, especially since I’m trying to apply the method to a video-based use case. Just wondering if you were able to sort it out, would appreciate any insights if you’ve figured it out!

LokeshaRasanjalee avatar Apr 06 '25 08:04 LokeshaRasanjalee