DamageProfiler icon indicating copy to clipboard operation
DamageProfiler copied to clipboard

Read processing gets cripplingly slow over time

Open TCLamnidis opened this issue 2 years ago • 0 comments

When running DamageProfiler (0.4.9, within nf-core/eager 2.4.6) on larger bam files, the rate at which reads are processed becomes slower and slower.

At first:

2023-07-28 12:12:04 INFO  StartCalculations:115 - 100 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 200 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 300 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 400 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 500 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 600 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 700 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 800 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 900 Reads processed.
2023-07-28 12:12:04 INFO  StartCalculations:115 - 1000 Reads processed.

Later:

2023-08-08 19:42:30 INFO  StartCalculations:115 - 302400600 Reads processed.
2023-08-08 21:13:01 INFO  StartCalculations:115 - 302400700 Reads processed.
2023-08-08 22:41:12 INFO  StartCalculations:115 - 302400800 Reads processed.
2023-08-09 00:11:56 INFO  StartCalculations:115 - 302400900 Reads processed.
2023-08-09 01:38:03 INFO  StartCalculations:115 - 302401000 Reads processed.
2023-08-09 03:06:05 INFO  StartCalculations:115 - 302401100 Reads processed.
2023-08-09 04:34:23 INFO  StartCalculations:115 - 302401200 Reads processed.
2023-08-09 06:01:06 INFO  StartCalculations:115 - 302401300 Reads processed.
2023-08-09 07:31:30 INFO  StartCalculations:115 - 302401400 Reads processed.
2023-08-09 09:02:19 INFO  StartCalculations:115 - 302401500 Reads processed.

While at first it takes less than a second to process 1000+ reads, the rate eventually drops to 100 reads per 1.5h. This potentially gets worse from there, though i did not have the patience to find out.

Conceptually, I do not understand why this slowdown might be happening in terms of actual computations. Any ideas why this might be? My current guess is that there is a memory leak somewhere that forces the program to enter an endless loop of garbage_collection <==> process more reads, without being able to free enough memory?

TCLamnidis avatar Aug 09 '23 10:08 TCLamnidis