ms2pip icon indicating copy to clipboard operation
ms2pip copied to clipboard

Duplicate modifications cause crashes

Open paretje opened this issue 5 years ago • 0 comments

In Python, we use the modification name as a unique identifier of a modification. When we have multiple modification with the same name, we just overwrite the previous modification. However, this causes a couple of issues:

  1. Modifications with the same name but different amino acid are silently ignored in Python
  2. Any modification with duplicate names cause a new mod_id to be picked. We write these modifications to the disk, and then parse the files in C. The first line is the number of modifications. C assumes the mod_id to be a continuous range, and allocates memory according to that number. There are no checks on the mod_id, and as a result, we write to random memory. This may cause all kinds of issues.

So, we need to:

  1. validate mod_id in C
  2. prevent the generation of new mod_id for duplicate modifications. Either just warn and ignore, warn and update the existing entry, or raise an exception and stop further processing

Optionally, we can use this opportunity to support having the same modification name for multiple amino acids.

paretje avatar Apr 24 '20 13:04 paretje