segger_dev icon indicating copy to clipboard operation
segger_dev copied to clipboard

High memory and runtime requirements when adding scRNA-seq data

Open asmariyaz23 opened this issue 4 months ago • 3 comments

Hello,

I’m trying to run Segger with the tutorial data. The pipeline runs fine (with reasonable memory and runtime) when I exclude the scRNA-seq data, but when I include it, the run requires significantly more disk space, memory, and time.

I extracted the scRNA-seq data from the Xenium example dataset. Could you clarify why this might be happening or how best to handle it?

Thank you!

asmariyaz23 avatar Aug 28 '25 15:08 asmariyaz23

Hi @asmariyaz23 I assume you're running the pipeline on 5K panel? In this case the tokenized version runs slowly, we will fix this, but for now it's recommended to use the scRNAseq embedding as it will run way faster.

EliHei2 avatar Sep 09 '25 06:09 EliHei2

Hi @EliHei2 I am running the pipeline on dataset used in the Segger tutorial here. Does the pipeline version of running the 3 steps use scRNAseq embedding or tokenized version? . I use the config attached to run the pipeline.

config.pdf

asmariyaz23 avatar Oct 10 '25 15:10 asmariyaz23

Hello @EliHei2, I took another dataset with slimmed down scrnaseq annotated datasets and the training module is taking longer than 8 hours. Is this expected? I have pasted the 2 commands I am running here:

apptainer exec --bind /project/def-dcook/bin/segger/segger_dev:/project/def-dcook/bin/segger/segger_dev --pwd /project/def-dcook/active/cell_segmentation_segger /project/def-dcook/bin/segger/segger_dev_cuda121.sif python3 /project/def-dcook/bin/segger/segger_dev/src/segger/cli/create_dataset_fast.py --base_dir /project/def-dcook/active/xenium_hgsc/HGSC_SP24_248

apptainer exec \
  --bind /project/def-dcook/bin/segger/segger_dev:/project/def-dcook/bin/segger/segger_dev \
  --pwd /project/def-dcook/active/cell_segmentation_segger \
  --nv /project/def-dcook/bin/segger/segger_dev_cuda121.sif \
  python3 /project/def-dcook/bin/segger/segger_dev/src/segger/cli/train_model.py \
    --dataset_dir data_segger \
    --models_dir model_dir \
    --sample_tag first_training \
    --init_emb 8 \
    --hidden_channels 32 \
    --num_tx_tokens 500 \
    --out_channels 8 \
    --heads 2 \
    --num_mid_layers 2 \
    --batch_size 4 \
    --num_workers 4 \
    --accelerator cuda \
    --max_epochs 200 \
    --save_best_model True \
    --learning_rate 1e-3 \
    --devices 1 \
    --strategy auto \
    --precision 16-mixed \
    --pretrained_model_version 0

asmariyaz23 avatar Oct 16 '25 13:10 asmariyaz23