Spectra icon indicating copy to clipboard operation
Spectra copied to clipboard

Bug in filterPrecursorScan()

Open lgatto opened this issue 3 years ago • 4 comments

> library(rpx)
> PXD022816 <- PXDataset("PXD022816")
> (mzmls <- pxget(PXD022816, grep("mzML", pxfiles(PXD022816))[1:2]))
Loading QEP2LC6_HeLa_50ng_251120_01-calib.mzML from cache.
Loading QEP2LC6_HeLa_50ng_251120_02-calib.mzML from cache.
[1] "~/.cache/rpx/207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML"
[2] "~/.cache/rpx/207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML"
> sp <- Spectra(mzmls)
> sp2 <- filterPrecursorScan(sp, 2490)
> length(sp2)
[1] 12
spectraData(sp2)[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel acquisitionNum precScanNum dataOrigin
1 2479 NA 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
2 2480 2479 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
1 2482 NA 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
2 2485 2482 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
1 2486 NA 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
2 2490 2486 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
1 2479 NA 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
1 2480 NA 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
2 2482 2480 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
1 2485 NA 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
2 2486 2485 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
2 2490 2485 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML

Issues

  1. First, the acquisition number for all files are extracted, which is probably not what the user wanted. This could be handled explicitly with a call to filterDataOrigin() if I wanted to extract the set from a single file.
  2. The result is wrong. We get the acquisition numbers 2490 from the two files, their precursor scans 2485 and 2486 (again from both files), but then these recursively extract each other across different files. The correct results should contain 4 scans:
> spectraData(filterPrecursorScan(filterDataOrigin(sp, unique(dataOrigin(sp))[1]), 2490))[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel acquisitionNum precScanNum dataOrigin
1 2486 NA 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
2 2490 2486 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML

and

> spectraData(filterPrecursorScan(filterDataOrigin(sp, unique(dataOrigin(sp))[2]), 2490))[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel acquisitionNum precScanNum dataOrigin
1 2485 NA 13707d04ce6b_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
2 2490 248513707d04ce6b_QEP2LC6_HeLa_50ng_251120_02-calib.mzML

Suggested workarounds

  • We make filterPrecursorScan() only work with a data from a single data origin ( 👎🏻 )
  • We run it on different files to (1) avoid the wrong results (recursive scan selection across files) even though an acquisition should be file specific (not ideal)
  • We add a dataOrigin argument (that can be omitted when there's only a single data origin) that needs to be same length at the acquisition number ( 👍🏻 )

@sgibb @jorainer - any comments/suggestions?

lgatto avatar Apr 07 '21 12:04 lgatto