Improve assign peptide type 243
closes #243
Let me know if I should review this.
@jpquast i'm getting an error from vroom on macos and windows- have you seen this error before? (it's not related to the changes i made)
Hi Elena,
I had a quick look at the function. I think it is necessary to introduce the protein_id column as you do in order to know which peptides belong to the same protein. However, as I can see you don't have a way to know about the initiator methionine. I think maybe I was not very clear about that. The point was to assign the peptide as fully tryptic if it starts basically at position 2 of the protein and there is no other peptide that starts at 1. In those cases the initiator cysteine is likely completely absent for most of the copies of this protein in the cell. As far as I can tell you check if any of the peptides of the protein don't have a preceding methionine (which could also be in the middle of the protein).
What I would suggest is to also require the start and end column for each protein and then just check if any starts with position 1. If yes keep the original annotation as it is. If not then any peptide starting at position 2 is considered fully-tryptic if it fulfils its C-terminal criterium.
As far as I can tell the output of the function is generally currently wrong:
assign_peptide_type(data, aa_before, last_aa, aa_after, protein_id)
aa_before last_aa aa_after protein_id pep_type
1 K R T P1 fully-tryptic
2 S K R P1 fully-tryptic
3 T Y T P2 non-tryptic
4 M K R P2 semi-tryptic
Row 2 should be semi-tryptic since the aa_before is not K/R. Not sure why the function now gets those standard cases wrong. Row 4 works of course as you expected based on your code, but is conceptually wrong as explained above.
Not sure why vroom fails, but if it still does we can have a look at that.
@elena-krismer maybe you can have a look at this. the r-devel version on ubuntu seems to fail. The problem is that the adjusted p-value has a different values than in the current version. This seems like a more complicated problem. Would be great if you could have a look at it.
I also created this PR in ggrepel https://github.com/slowkow/ggrepel/pull/263. This is related to all the warnings we get for functions that use ggrepel in plots. There is no way of silencing them. I hope they accept my PR or find another way.
Note the issue was reported to the R-developers