alphapept icon indicating copy to clipboard operation
alphapept copied to clipboard

Long Runtime for converted mzML files

Open straussmaximilian opened this issue 2 years ago • 0 comments

There seems to be a bug when running converted mzML files.

Example log:

2022-06-01 14:13:44> Hill extraction with centroid_tol 8 and max_gap 2
2022-06-01 14:14:14> Number of hills 6,629,187, len = 103.08
2022-06-01 14:14:14> Repeating hill extraction with centroid_tol 6.05
2022-06-01 14:14:35> Number of hills 4,099,238, len = 29.02
2022-06-01 14:14:42> After duplicate removal of hills 3,001,610
2022-06-01 14:16:12> After split hill_ptrs 5,460,682
2022-06-01 14:16:15> After filter hill_ptrs 5,433,458
2022-06-01 14:26:47> Extracting hill stats complete
2022-06-01 14:36:19> Found 163,753 pre isotope patterns.
2022-06-03 04:29:51> Extracted 164,306 isotope patterns.
2022-06-03 04:30:06> Report complete.
2022-06-03 04:30:06> Matching features to query data.
2022-06-03 04:30:07> Saving feature table.

Note how long the algorithm was stuck at the isotope pattern extraction; there is almost the same number of pre-isotope patterns to final isotope patterns.

Observations:

  • Some mz arrays were not sorted
  • Isotope pattern extraction can take very long when having large clusters

straussmaximilian avatar Jun 03 '22 16:06 straussmaximilian