MSnbase
MSnbase copied to clipboard
MRM Analysis Help
Hello,
This references an issue I posted a while ago:
https://github.com/lgatto/MSnbase/issues/551
So, I have a set of Agilent .d MRM files that I was able to convert to mzML and read in with readSRMdata()
as described in the link, above.
I have a list of transitions for 50 compounds; rt, precursorMz, productMz, etc. and I want to find these in the mzML files and report the intensities.
So:
# load in a single file
raw11 <- readSRMData(files = "20211109_pooleds_08.mzML")
Now, if I do the following:
write.table(raw11, sep = "\t", file = "out_file.txt")
I get a file starting with:
new("Chromatogram", rtime = c(0.00886666666666667, 0.0190333333333333 [...]
This contains the rtimes, intensities, mz (precursor, product), etc.
Is this every mass found in the file? So, to get the intensities of each compound, would it be a matter of "simply" cross-referencing the RTs/MZs and intensities and adding them up? And if a given transition pair is not reported, does that mean it simply wasn't found?
I've also tried findChromPeaks()
but the results were similar and since this will process many types of compounds, I can't have custom peak-picking parameters for each run.
From findChromPeak, I can export it:
new("XChromatogram", chromPeaks = c(9.86523333333333, 0.00886666666666667, 26.9917, 3137.98745796791, 136236.712714377, 6163.96044921875, 6658.80879462996, 57.2791738790809), chromPeakData = new("DFrame", rownames = NULL, nrows = 1, listData = list(ms_level = 1, is_filled = FALSE), elementType = "ANY", elementMetadata = NULL, metadata = list()), rtime = c(0.00886666666666667, 0.0190333333333333 [...]
It seems the output here is very similar (rtime, etc.)
Another issue is the comparison of intensities from this process compared to the intensities output by Skyline. Do you have any experience in comparing the output of Skyline to that of R? The numbers can be different.
Thank you for any insight you can provide, james
I quite don't understand what exactly you want to do, but exporting the data with write.table
is not what you should aim for - that function simply calls as.character
on the object containing the data and writes the output.
If you called findChromPeaks
you can get a matrix
with the integrated peak areas using the chromPeaks
function. Ideally, have a look at the specific help pages for the objects being able to represent chromatographic data (i.e. ?Chromatogram
?MChromatograms
in MSnbase
and ?XChromatogram
from the xcms
package), there you will find some more info on how to actually access and extract data from these objects.
I have a list of amino acids (but it could be anything) for which I have expected retention time and precursor/product mz information.
For example: Valine: RT: 7.9min; Mzprecursor: 118.08; Mzproduct: 72.1 AminoPentanoic acid: RT: 10.7min; Mzprecursor: 118.08; Mzproduct: 101
If I write out a table (I know it's as.character
, but I can process it outside of R) of raw11
from this:
raw11 <- readSRMData(files = "20211109_pooleds_08.mzML")
Each line of the output has a precursor/product pair identifier (excerpt: mz = c(NA, NA), filterMz = c(NA, NA), precursorMz = c(104.07, 104.07), productMz = c(58.196, 58.196), fromFile = 1,[...]
(this is the identifier for alpha aminobutyric acid, fyi)) along with paired lists of retention times and intensities. So, I wrote a simple parser in perl to assemble the data; adding the intensities if within certain parameters (RT range, for example) to get the intensity of a given precursor/product pair. I don't know if this is correct.
When I run findChromPeaks
, I can access the data via chromPeaks
cp <- findChromPeaks(raw11, param = mfp)
cp_out <- chromPeaks(cp)
Then cp_out looks like you would imagine it:
rt rtmin rtmax into intf maxo maxf sn row column
[1,] 7.950067 0.007966667 21.53313 19986.593 817832.62 164986.953 55225.430 72.79093 6 1
[2,] 7.837617 0.007516667 27.00053 18008.686 781024.00 152890.172 48975.998 73.43726 8 1
[...]
But I don't know the precursor/product MZs. I, of course, can use filterRt
(I noted the default unit is minutes here...is that based on the input data?) and filterMz
, but I question that approach.
I hope I'm clearer this time!
So, the output would be a list of the precursor/product pairs and their associated intensities.
Thanks for the clarification. Ideally, you should be able to everything you want to do in R - without the need for import/export. To explain: the MChromatograms
(or then the XChromatograms
once you call findChromPeaks
) is organized as a matrix
, each row should represent a single combination of precursor and product m/z, each column one mzML file (sample). With the functions precursorMz
and productMz
on your imported data you can get the precursor and product m/z values for each row. In the output of the chromPeaks
call above you can also see columns "row"
and "column"
which indicates the row and sample in which the chromatographic peak was identified (in your case for rows 6 and 8 a peak was found in sample 1). This should give you all needed info to work with your data (even maybe without filterRt
and filterMz
?).
Regarding the unit of the retention times - that depends on the input files. So far I've seen only RT in seconds, but maybe you have them in minutes?
Thank you very much!
Just so that I am perfectly clear here because the data set I am working with has been analyzed by someone using Skyline and I have all the intensities for each compound in the files from Skyline, but the numbers I'm getting here are quite different.
So, the output of chromPeaks
has 50 rows. This is the same number of compounds in my transition list:
> chromOut <- chromPeaks(cp)
> chromOut
rt rtmin rtmax into intf maxo maxf sn row
[1,] 9.865233 8.866667e-03 26.99170 3137.98746 136236.713 6163.96045 6658.809 57.27917 1
[2,] 11.199100 8.866667e-03 24.37488 18186.35083 798307.718 126085.32812 50817.838 74.89145 2
[3,] 26.991700 2.542365e+01 26.99170 74.91288 5662.638 66.68001 7322.346 10.79111 2
[4,] 10.567583 8.633333e-03 26.99148 1819.18958 79064.907 10531.56055 3820.905 56.69437 3
[5,] 10.608083 8.416667e-03 26.99125 1595.24131 69285.112 8672.84082 3229.372 54.64408 4
[6,] 13.346867 8.183333e-03 26.99102 1100.59197 47627.832 150.12001 1799.950 44.14541 5
[7,] 7.950067 7.966667e-03 21.53313 19986.59339 817832.618 164986.95312 55225.430 72.79093 6
[8,] 27.000983 2.165532e+01 27.00098 468.56044 70797.294 354.34003 27563.392 36.33046 6
[...]
(I truncated the last column, "column.")
When I run precursorMz
(or the product version), I get 45 rows.
> precursorMz(cp)
mzmin mzmax
[1,] 104.07 104.07
[2,] 104.07 104.07
[3,] 106.04 106.04
[4,] 110.05 110.05
[5,] 114.06 114.06
[6,] 116.07 116.07
[7,] 118.08 118.08
[...]
Does the "rows" column in the output of chromPeaks
refer to the rows in the precursorMz
output? I assume this is the case as the "rows" only go up to 45). So, it would just be a matter of cross-referencing those in order to "link" the MZs with the RTs and intensities?
And when the row numbers in the chromPeaks
output are duplicated, it simply found that precursor/product pair at another RT?
So, in this example, if I look at row 2 and 3 from chromPeaks
:
[2,] 11.199100 8.866667e-03 24.37488 18186.35083 798307.718 126085.32812 50817.838 74.89145 **2**
[3,] 26.991700 2.542365e+01 26.99170 74.91288 5662.638 66.68001 7322.346 10.79111 **2**
These rows have 2 in the "row" column, which in the precursor/productMz
is: 104.07 and 87.096.
This would make it gamma aminobutyric acid with an expected RT of ~11.1 minutes. with the intensity (into: 18186.35083 intf: 798307.718). The intensity from Skyline for this compound in this file is 990324.875. (I don't understand the differences from Skyline...and these differences keep coming up, especially when folks are staunch Skyline users.)
Because row 3 also references row 2 in the precursor output, then it also found the 104.07/87.096 pair at 26.99 minutes? This is the end of the run. And I know the intensities are extremely low to the point of rejecting it, but I want to make sure I'm at least on the right track.
thanks again! james
Does the "rows" column in the output of chromPeaks refer to the rows in the precursorMz output?
yes. The "row" column should be equivalent to the number of chromatograms/transitions you have. The "column" column refers to the sample (or mzML file). If you read 5 mzML files, you'll have 5 columns.
So, it would just be a matter of cross-referencing those in order to "link" the MZs with the RTs and intensities?
Exactly. You could basically do that with precursorMz(cp)[chromOut[, "row"], ]
and combine that with the chromOut
.
And when the row numbers in the
chromPeaks
output are duplicated, it simply found that precursor/product pair at another RT?
That depends on the value in the "column" column. It could be a peak in another sample, or be additional peaks that were detected by the algorithm in the same chromatogram. Maybe best to plot the data (plot(cp[2, ]
) to see what it is and how the peak look like.
I don't know how Skyline integrates the peak signal, but if the integration is performed differently you can get different results. I would definitely first check if the detected peaks look OK (by plotting them) and eventually adapt the peak detection settings.
I have yet another question.
I've followed this procedure for a single file:
mfp <- MatchedFilterParam(binSize = 0.1, impute="lin", snthresh = 2)
raw11 <- readSRMData(files = "20211109_pooleds_08.mzML")
cp <- findChromPeaks(raw11, param = mfp)
chromOut <- chromPeaks(cp)
preTEST <- precursorMz(cp)[chromOut[, "row"], ]
preTEST <- as.vector(preTEST)
prodTEST <- productMz(cp)[chromOut[, "row"], ]
prodTEST <- as.vector(prodTEST)
chromOutDF <- as.data.frame(chromOut)
into <- chromOutDF$into
into <- as.vector(into)
intf <- chromOutDF$intf
intf <- as.vector(intf)
rt <- chromOutDF$rt
rt <- as.vector(rt)
fnoutDF <- data.frame(preTEST, prodTEST, into, intf, rt)
When I write out the table (and sort on precursorMz, then productMz, then RT to get all compounds together), I get doubled lines, like this:
preTEST prodTEST into intf rt
76.06 29.996 1167.168707 50545.62772 11.70933333
76.06 29.996 1167.168707 50545.62772 11.70933333
79.07 31.996 1140.473621 49386.54004 11.7702
79.07 31.996 1140.473621 49386.54004 11.7702
90.06 43.996 3178.60995 138230.4354 9.743716667
90.06 43.996 3178.60995 138230.4354 9.743716667
94.07 46.996 2310.862353 100428.8582 9.641666667
94.07 46.996 2310.862353 100428.8582 9.641666667
[...]
I'm not sure if that's coming from what I'm doing or whether it is in the mzML file (this is from a single file).
james
Please check what data type precursorMz(cp)
returns. Could be that the lower and upper m/z are reported for precursor and product m/z (note that even if you provide a single value m/z value, internally it is converted to a lower-upper m/z to support also specifying m/z ranges). If a list
or matrix
(with two columns) is returned I suggest to calculate the mean m/z for each.
Thank you. Once I recover from a major computer crash, I will look into it...but I am pretty sure that the upper/lower MZs were identical.
Meaning, they were identical in the output (removing duplicates worked), not sure if the numbers are truncated in the output (though, I've never seen that before).
Yes, exactly, they will be identical if a single m/z value was available (i.e. a target precursor m/z and not a m/z range). To solve your problem I assume you would need to reduce this range down again to a single value before you create the various data.frame
s.