proseg icon indicating copy to clipboard operation
proseg copied to clipboard

Proseg cosmx is taking more than 12 hours (and counting)

Open jai-mathur opened this issue 2 months ago • 4 comments

Hi @dcjones, I was running the public lymph node 6k dataset with Proseg, and it is taking more than 12 hours and counting. Is this normal? I am using a 1 TB RAM machine, and it has used about 750 GB RAM so far. I have not specified any other flags, so it's using all the threads on my machine (128). I am also using the latest version

Please let me know if this is expected.

jai-mathur avatar Oct 21 '25 12:10 jai-mathur

Something definitely seems off, unless it is a truly massive dataset. I've never seen it take this long or use this much memory.

My best guess is that the coordinate scale is being misinterpreted and proseg is being run at an extremely high resolution. I'd have to see what the data looks like though. It would make sense if the format is different that what proseg expects either because it's old data or very new and the format changed. Could you paste the first few lines of the transcript table?

dcjones avatar Oct 21 '25 17:10 dcjones

https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/cosmx-human-lymph-node-ffpe-dataset/

Its this public dataset.

fov,cell_ID,cell,x_local_px,y_local_px,x_gl obal_px,y_global_pX,z,target, CellComp 1,0,c_1_1 _0,4256,2904,1 8089.7931257 884,86674.2849349976,0,PNKPNone 1,0,c_1_1_0,4256,3266,1 8089.95405832 93,86311.3562266032,0, N DUFA3,None 1,0,c_1_1_0,4256,3257,1 8089.9024009 705,86320.86912779094,1,GRK6,None 1,0,c_1_1 _0,4256,3295,18089.773257 5735,86282.9367319743,1,LYZ,None 1,0,c_1_1_0,4256,3274,18089.9123350 779,86303.8698832194,2,COX1,None 1,0,c_1_L0,4256,3201,18089.9341901l 43,86376.5319188436,2,TRBC2,None 1,0,c_1_1_0,4256,3328,18089.8328622 182,86249.4945526123,1,SQSTM1,None 1,0,c_1_1_0,4256,3617,18089.7533893 585,85959.6411387126,1,FAPNone 1,0,c_1_1 0,4256,3628,1 8089.83286221 82,85948.6103057862,1,PTPRCAPNone

jai-mathur avatar Oct 21 '25 19:10 jai-mathur

Hi @dcjones , so this dataset didnt run but I am successfully able to run other cosmx 6k datasets, which have lower number of FOVs (~150). The public dataset if definetely old, and also has 400 FOVs, so maybe its related to that?

jai-mathur avatar Oct 27 '25 15:10 jai-mathur

I can confirm that it does seems to use quite a lot of memory. Part of this is just that it's a large dataset and CosMx 6k has lot of transcripts per cell in general, but it's still higher than I would have expected. I've pushed some changes today to reduce memory usage, but I don't think I've entirely solved the issue yet.

dcjones avatar Oct 28 '25 00:10 dcjones