searchgui icon indicating copy to clipboard operation
searchgui copied to clipboard

File Extension Mismatch in ThermoRawFileParser Conversion Using SearchGUI

Open jacobfh1 opened this issue 1 year ago • 1 comments

Hi team!

When utilizing ThermoRawFileParser within SearchGUI to convert .raw files to either .mzML or .mgf formats, there appears to be a discrepancy in the output file extensions.

Conversion from .raw to .mzML results in an output file with the extension .mzML (uppercase 'M' and 'L'). This causes detection issues as the expected extension is .mzml (all lowercase) according to the PeptideShaker.log file. Conversion from .raw to .mgf produces files with an unexpected extension .mzml.mgf, thereby preventing the detection of the intended [filename].mgf output.

Software Versions Tested: SearchGUI: v.4.2.7 and v.4.3.1 PeptideShaker: v.2.22.2 and v.3.0.1 Platform: Ubuntu

This behavior suggests a file naming bug that might persist across multiple versions, thereby impacting the post-processing workflow in tools such as PeptideShaker. However, the converted files can be manually utilized with PeptideShaker.

Suggested Remediation:

  1. Ensure output file extensions match the expected formats to facilitate downstream processing.
  2. Review the file naming conventions within ThermoRawFileParser and correct any inconsistencies.

Best, Jacob

jacobfh1 avatar Oct 12 '23 14:10 jacobfh1

Hi Jacob,

Thanks for the suggestion. However, all we are doing is using ThermoRawFileParser command lines of the following types:

ThermoRawFileParser.exe -i=C:\[...]\qExactive01819.raw -b=C:\[...]\qExactive01819.mzml -f=2 -e -x or ThermoRawFileParser.exe -i=C:\[...]\qExactive01819.raw -b=C:\[...]\qExactive01819.mgf -f=2 -e -x

The file names we provide seem to be ignored by ThermoRawFileParser. Here is an excerpt from their GitHub pages: "Output file extension is determined by the used output format and (optional) gzip compression, for example, if format is MGF without gzip compression, the output file will receive .mgf extension, if format is mzML with gzip compression the output file will have .mzML.gz extension. All user input will be standardized to fulfill abovementioned requirements."

Note the upper case mzML extension. In other words, the changes you suggest would require changes to the ThermoRawFileParser code, as in SearchGUI we already (try) to use the conventions you are suggesting. I would recommend that you maks the same suggestion in the ThermoRawFileParser issue tracker?

With regards to the mgf file not being found in SearchGUI, you can simply stick to the default mzml output from ThermoRawFileParser as we will convert to mgf internally for the search engines that do not support mzml.

Best regards, Harald

hbarsnes avatar Oct 12 '23 18:10 hbarsnes