Spectra
Spectra copied to clipboard
Bug in filterPrecursorScan()
> library(rpx)
> PXD022816 <- PXDataset("PXD022816")
> (mzmls <- pxget(PXD022816, grep("mzML", pxfiles(PXD022816))[1:2]))
Loading QEP2LC6_HeLa_50ng_251120_01-calib.mzML from cache.
Loading QEP2LC6_HeLa_50ng_251120_02-calib.mzML from cache.
[1] "~/.cache/rpx/207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML"
[2] "~/.cache/rpx/207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML"
> sp <- Spectra(mzmls)
> sp2 <- filterPrecursorScan(sp, 2490)
> length(sp2)
[1] 12
spectraData(sp2)[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel | acquisitionNum | precScanNum | dataOrigin |
---|---|---|---|
1 | 2479 | NA | 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML |
2 | 2480 | 2479 | 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML |
1 | 2482 | NA | 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML |
2 | 2485 | 2482 | 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML |
1 | 2486 | NA | 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML |
2 | 2490 | 2486 | 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML |
1 | 2479 | NA | 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML |
1 | 2480 | NA | 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML |
2 | 2482 | 2480 | 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML |
1 | 2485 | NA | 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML |
2 | 2486 | 2485 | 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML |
2 | 2490 | 2485 | 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML |
Issues
- First, the acquisition number for all files are extracted, which is probably not what the user wanted. This could be handled explicitly with a call to
filterDataOrigin()
if I wanted to extract the set from a single file. - The result is wrong. We get the acquisition numbers 2490 from the two files, their precursor scans 2485 and 2486 (again from both files), but then these recursively extract each other across different files. The correct results should contain 4 scans:
> spectraData(filterPrecursorScan(filterDataOrigin(sp, unique(dataOrigin(sp))[1]), 2490))[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel | acquisitionNum | precScanNum | dataOrigin |
---|---|---|---|
1 | 2486 | NA | 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML |
2 | 2490 | 2486 | 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML |
and
> spectraData(filterPrecursorScan(filterDataOrigin(sp, unique(dataOrigin(sp))[2]), 2490))[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel | acquisitionNum | precScanNum | dataOrigin |
---|---|---|---|
1 | 2485 | NA | 13707d04ce6b_QEP2LC6_HeLa_50ng_251120_02-calib.mzML |
2 | 2490 | 248513707d04ce6b_QEP2LC6_HeLa_50ng_251120_02-calib.mzML |
Suggested workarounds
- We make
filterPrecursorScan()
only work with a data from a single data origin ( 👎🏻 ) - We run it on different files to (1) avoid the wrong results (recursive scan selection across files) even though an acquisition should be file specific (not ideal)
- We add a
dataOrigin
argument (that can be omitted when there's only a single data origin) that needs to be same length at the acquisition number ( 👍🏻 )
@sgibb @jorainer - any comments/suggestions?