FragPipe icon indicating copy to clipboard operation
FragPipe copied to clipboard

mzID Report Output

Open xLinkKnight opened this issue 3 years ago • 4 comments

A suggestion to add the ability to generate mzIdentML reports through the FragPipe interface.

Issue with Philosopher: Unsuccessful in using Philosopher with the pipeline mode to generate mzID files. Error message: Cannot read file:open .meta\ev.param.bin: The system cannot find the file specified. The .meta folder is created but no ev. files are ever generated.

I've attached the philosopher.yml parameter file. Attempting to process timsTOF data. Using the _calibrated MGF files as input. Not sure if the pipeline supports Bruker raw data.

philosopher.yml.txt

Odd behavior involving "/" and "\": (On Windows)

Unknown file type. No file loaded.C:\MSFragger\Espinosa/negative_COV0006941_41_S1-F5_1_2427_calibrated.mgf
WARNING: cannot open data file C:\MSFragger\Espinosa/negative_COV0006941_41_S1-F5_1_2427_calibrated.mgf in msms_run_summary tag... unrecognized extension .mgf, trying .mzML ...
WARNING: CANNOT correct data file C:\MSFragger\Espinosa/negative_COV0006941_41_S1-F5_1_2427_calibrated.mzML in msms_run_summary tag...
Unknown file type. No file loaded.C:\MSFragger\Espinosa/negative_COV0006941_41_S1-F5_1_2427_calibrated.mgf
WARNING: cannot open data file C:\MSFragger\Espinosa/negative_COV0006941_41_S1-F5_1_2427_calibrated.mgf in msms_run_summary tag... unrecognized extension .mgf, trying .mzML ...
WARNING: CANNOT correct data file C:\MSFragger\Espinosa/negative_COV0006941_41_S1-F5_1_2427_calibrated.mzML in msms_run_summary tag...```

xLinkKnight avatar Nov 27 '20 23:11 xLinkKnight

I was able to create the .mzid file by following the Simple-Data-Analysis tutorial: https://github.com/Nesvilab/philosopher/wiki/Simple-Data-Analysis

This route works well for one raw file. Is there a way to do this analysis for a larger data set? I'm still verifying that the mzIdentML file will submit to PRIDE without error.

Edit: PRIDE submission tool reports File could not be converted to mzTab format. Not sure what is the source of the error.

xLinkKnight avatar Nov 28 '20 22:11 xLinkKnight

So far the program will generate one file per data set or project.

prvst avatar Nov 30 '20 15:11 prvst

Felipe, does mzID represents the filtered PSM.tsv tables, and the razor protein assignment for each shared PSM? I think (?) we need to have mzID files represent the final analysis results (and not MSFragger pepXML or PeptideProphet pep.xml). What about precursor m/z? Do we need to write original m/z or calibrated value that was used for identification? Also before or after monoisotopic correction? I think if we have an option to write mzID that someone may submit to a repository or use in some other tool, we need to make sure we have the right values written in there. I myself do not remember much about mzID and what should be written in there.

From: Felipe Leprevost [email protected] Sent: Monday, November 30, 2020 10:11 AM To: Nesvilab/FragPipe [email protected] Cc: Subscribed [email protected] Subject: Re: [Nesvilab/FragPipe] mzID Report Output (#265)

External Email - Use Caution

So far the program will generate one file per data set or project.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/FragPipe/issues/265#issuecomment-735844430, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM66W456NI3I6BPVXQZ3SSOYXHANCNFSM4UFMMQOQ.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi avatar Nov 30 '20 15:11 anesvi

The mzID is like a report for each data set or experiment folder. If you have 10 data sets, you will have 10 files. If the experiment is processed together, theoretically, it is possible to merge them all into one file, but I don't think it will be feasible due to the size of the files. Each mzID contains all data and all information about PSMs, peptides, proteins, PTMs, the protein database, and more, so the files tend to be quite larger than the individual reports. If you have 10 data sets merged into one single XML file, I don't think you will be able to even open it in a text editor or similar. The schema is pretty flexible and allows us to add custom data to it, so we can add all to the files.

prvst avatar Nov 30 '20 15:11 prvst