proseg icon indicating copy to clipboard operation
proseg copied to clipboard

Possible performance issue on CosMx data

Open yihming opened this issue 8 months ago • 4 comments

Hello,

First, thank you for developing such a great tool!

When trying your package on our CosMx data, I found that its runtime and memory usage performance seem down-perform comparing with your benchmark (Figure 11 in your paper).

In specific, our CosMx slide contains ~ 6 * 10^7 transcripts (I simply count the transcripts.csv.gz returned by running stitch_fov jl script). With this data scale, I ran proseg with the following command (by using 30 threads):

proseg --cosmx-micron --nthreads 30 transcripts.csv.gz

and it took 11 hours 38 minutes to finish, and the peak memory usage was around 20 GB.

However, based on your benchmark shown in Figure 11 of your paper, proseg takes only ~17 minutes to finish for ~10^7 transcripts, and the memory usage is below 9.7 GB.

Given this inconsistency, I just wonder if I missed any step to make the execution achieve better performance.

The machine I used has 32 vCPUs and 249 GB memory, and the OS/software info is below:

  • OS: Ubuntu 22.04.4 LTS
  • rustc + cargo: v1.79.0
  • proseg: v1.0.5
  • julia: v1.10.4

Thanks!

Sincerely, Yiming

yihming avatar Jun 26 '24 16:06 yihming