Proseg cosmx is taking more than 12 hours (and counting)
Hi @dcjones, I was running the public lymph node 6k dataset with Proseg, and it is taking more than 12 hours and counting. Is this normal? I am using a 1 TB RAM machine, and it has used about 750 GB RAM so far. I have not specified any other flags, so it's using all the threads on my machine (128). I am also using the latest version
Please let me know if this is expected.
Something definitely seems off, unless it is a truly massive dataset. I've never seen it take this long or use this much memory.
My best guess is that the coordinate scale is being misinterpreted and proseg is being run at an extremely high resolution. I'd have to see what the data looks like though. It would make sense if the format is different that what proseg expects either because it's old data or very new and the format changed. Could you paste the first few lines of the transcript table?
https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/cosmx-human-lymph-node-ffpe-dataset/
Its this public dataset.
fov,cell_ID,cell,x_local_px,y_local_px,x_gl obal_px,y_global_pX,z,target, CellComp 1,0,c_1_1 _0,4256,2904,1 8089.7931257 884,86674.2849349976,0,PNKPNone 1,0,c_1_1_0,4256,3266,1 8089.95405832 93,86311.3562266032,0, N DUFA3,None 1,0,c_1_1_0,4256,3257,1 8089.9024009 705,86320.86912779094,1,GRK6,None 1,0,c_1_1 _0,4256,3295,18089.773257 5735,86282.9367319743,1,LYZ,None 1,0,c_1_1_0,4256,3274,18089.9123350 779,86303.8698832194,2,COX1,None 1,0,c_1_L0,4256,3201,18089.9341901l 43,86376.5319188436,2,TRBC2,None 1,0,c_1_1_0,4256,3328,18089.8328622 182,86249.4945526123,1,SQSTM1,None 1,0,c_1_1_0,4256,3617,18089.7533893 585,85959.6411387126,1,FAPNone 1,0,c_1_1 0,4256,3628,1 8089.83286221 82,85948.6103057862,1,PTPRCAPNone
Hi @dcjones , so this dataset didnt run but I am successfully able to run other cosmx 6k datasets, which have lower number of FOVs (~150). The public dataset if definetely old, and also has 400 FOVs, so maybe its related to that?
I can confirm that it does seems to use quite a lot of memory. Part of this is just that it's a large dataset and CosMx 6k has lot of transcripts per cell in general, but it's still higher than I would have expected. I've pushed some changes today to reduce memory usage, but I don't think I've entirely solved the issue yet.