ms2pip
ms2pip copied to clipboard
Duplicate modifications cause crashes
In Python, we use the modification name as a unique identifier of a modification. When we have multiple modification with the same name, we just overwrite the previous modification. However, this causes a couple of issues:
- Modifications with the same name but different amino acid are silently ignored in Python
- Any modification with duplicate names cause a new
mod_idto be picked. We write these modifications to the disk, and then parse the files in C. The first line is the number of modifications. C assumes themod_idto be a continuous range, and allocates memory according to that number. There are no checks on themod_id, and as a result, we write to random memory. This may cause all kinds of issues.
So, we need to:
- validate
mod_idin C - prevent the generation of new
mod_idfor duplicate modifications. Either just warn and ignore, warn and update the existing entry, or raise an exception and stop further processing
Optionally, we can use this opportunity to support having the same modification name for multiple amino acids.