Support for timsTOF data
Description of feature
I tested identification workflow for timsTOF dataset in last week. The first step is to execute the tdf2mzml module and then run the same analyses as for the other data (Comet and MSGF+). Then I compared results with MaxQuant. Set PSM FDR as 0.01, and results of MaxQuant are from evidence.txt. Then overlap of identified peptides above 90% from MaxQuant. But the overlap of PSM is 0. Because scan number (index) is different bwteen quantms and MQ after comparing precursor mz.
compare_results.csv
compare_results_pep.zip
Some questions:
- For example, the peptide is identified in three scans those are different MQ. How to compare and check the difference due to different scan numbers (or index)? I manually checked the identifications and they all look like they match well?
| sequence | exp_mass_to_charge | quantms scan_number | MaxQuant MS/MS scan number |
|---|---|---|---|
| AAAAAAMAEQESAR | 695.3256725319 | 356222 | 65719 |
| AAAAAAMAEQESAR | 695.3256725319 | 356647 | 65719 |
| AAAAAAMAEQESAR | 695.3256725319 | 356088 | 65719 |
- Surprised quantms identified so many peptides! I also manually checked the identifications in only quantms and they all look like they match not bad? Further assessment is needed here
ping @wfondrie @jspaezp
Regarding scan number mismatch: do we not have a scan ID that we could use? Did you run MQ on the tdf or on the converted mzML?
Run MQ on the tdf. So there are difference. But I didn't know how map scan number in MQ between quantms
This is converted mzml in quantms.
Well I cannot say anything about how MQ deals with the numbers ... BUT ... I think it is very normal for a "scan" to mean very different things in different software dealing with PASEF data. The main reason is that a real scan in the .d has very little an noisy information. Most of the real information comes from the series of scans that encompass a single elution of the tims funnel (called a frame).
So when converting frames -> scans the naive approach of just splitting each scan is kind of useless (because it would lead to blocks of ~700 ms1 scans that dont share information with each other but share retention time, followed by a bunch of blocks of ms2 scans that look horrendous).
The more standard approach is to use sections of the frame and squash them into a single new scan. So when tdf2mzml says DEPENDING ON THE OPTIONS USED FOR EXPORTING "scan=1" could actually mean "frame 1, scans 200-250" (similar to how some qtofs have micro-scans that get aggregated) OR actually "scan 234234" in the run.
so .... I have no idea :P I would need to explore a bit more what the numbers are. Some things I would like to know:
- Are the index numbers contiguous? (in mq/qms, do all scans from 1-N exist? or are there steps like 1,53,134,...N)
- What settings were used in tdf2mzml?
- What were your acquisition parameters?
- How many scans do you have between every ms1 scan in the derived mzml?
- How does the IMS section look in the .mzml ?
Can you use the mzML from tdf2mzml and run it with MQ? Then compare and you should see if its a tdf-handling problem, or a pipeline-problem
I don't think there is a real problem, there is just lack of consensus on what the "scan number" should mean. Bc a scan in the mzml is not the same as a scan in the .d.
Having said that ... I do think its a good idea to run the same mq run with .d and .mzml to have some idea how the mappings compare.
I tried MQ from tdf2mzml converted mzml. But the error was reported. I also set instrument as Bruker TIMS. It doesn't look like the field MS:1000505 is recognised.
start 22/11/2024 19:58:44
title Assemble_run_info (1/1)
description E:\MSNet\Bat\Bat_MBat_20fraction_DDA_Data\P0031_TOF1_DDA_20241010_Bat10_30_200ng_149min_RA1_1_8030.mzML
error E:\MSNet\Bat\Bat_MBat_20fraction_DDA_Data\P0031_TOF1_DDA_20241010_Bat10_30_200ng_149min_RA1_1_8030.mzML_The given key 'MS:1000505' was not present in the dictionary._ at System.Collections.Generic.Dictionary`2.get_Item(TKey key)__ at PluginRawMzMl.MzMLRawFile.GetInfoForScanNumber(Int32 scanNumber) in C:\Users\bi\source\repos\net7\net\PluginRawMzMl\MzMLRawFile.cs:line 392__ at MqUtil.Ms.Raw.RawFile.InitFromRawFileImpl()__ at MqUtil.Ms.Raw.RawFile.InitFromRawFile()__ at MqUtil.Ms.Raw.RawFile.Init(String path1)__ at MqUtil.Ms.Raw.RawFileUtil.CreateRawFile(String path)__ at MaxQuantLibS.Domains.Peptides.Features.RunInfo.AssembleRunInfo(String mqparFile, Int32 fileIndex) in C:\Users\bi\source\repos\net7\net\MaxQuantLibS\Domains\Peptides\Features\RunInfo.cs:line 42__ at MaxQuantLibS.Domains.Peptides.Work.AssembleRunInfo.Calculation(String[] args, Responder responder) in C:\Users\bi\source\repos\net7\net\MaxQuantLibS\Domains\Peptides\Work\AssembleRunInfo.cs:line 17__ at MaxQuantLibS.Domains.Peptides.Work.MaxQuantWorkDispatcherUtil.PerformTask(Int32 taskType, String[] args, Responder responder) in C:\Users\bi\source\repos\net7\net\MaxQuantLibS\Domains\Peptides\Work\MaxQuantWorkDispatcherUtil.cs:line 7__ at MaxQuantLibS.Base.MaxQuantUtils.Run(Int32 softwareId, Int32 taskType, String[] args, Responder responder) in C:\Users\bi\source\repos\net7\net\MaxQuantLibS\Base\MaxQuantUtils.cs:line 275__ at MaxQuantTask.Program.Function(String[] args, Responder responder) in C:\Users\bi\source\repos\net7\net\MaxQuantTask\Program.cs:line 17__ at MqUtil.Util.ExternalProcess.Run(String[] args, Boolean debug)
end 22/11/2024 19:58:46
@daichengxin can you try with the newest msconvert and with option "combine ion mobility scans"