xcms
xcms copied to clipboard
findChromePeaks Error: BiocParallel errors element index: 1, 2, 3, 4, 5, 6, ... first error: Spectra are not ordered by retention time!
Hi! I would appreciate your help with this error, thanks:
Error: BiocParallel errors element index: 1, 2, 3, 4, 5, 6, ... first error: Spectra are not ordered by retention time!
I have followed the code from the vignettes, adjusting the parameters to fit my data
rm(list=ls(all=TRUE))
library(pander) library(RColorBrewer) library(mzR) library(msdata) library(xcms) library(faahKO) library(pheatmap) library(magrittr) library(ChemoSpec) library(speaq)
1. Data import----
#Get the full path to the .mzData files
setwd("C:/Users/ASUS/Documents/GCMS_chemometrics/Pyrolysis_project/Raw_data/PCA-Bio-gases/BG-Normal-Temperatura/")
pd <-read.csv("Metatemp.csv", header= TRUE, sep= ";")
filepath <- "C:/Users/ASUS/Documents/GCMS_chemometrics/Pyrolysis_project/Raw_data/PCA-Bio-gases/BG-Normal-Temperatura/GC/"
cdfs <- list.files(filepath, pattern = "mzData", full.names=TRUE, recursive = TRUE)
raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd), mode = "onDisk") ##resulting OnDiskMSnExp object contains general information about the number of spectra, retention times, the measured total ion current etc, but does not contain the full raw data (i.e. the m/z and intensity values from each measured spectrum).
2. Initial data inspection----
head(rtime(raw_data)) mzs <- mz(raw_data) #mz data from raw files
mzs_by_file <- split(mzs, f = fromFile(raw_data))
length(mzs_by_file) ##Split the list by file
Get the base peak chromatograms. This reads data from the files.
bpis <- chromatogram(raw_data, aggregationFun = "max")
Define colors for the groups
group_colors <- paste0(brewer.pal(3, "Set1")[1:3], "60")
names(group_colors) <- c("A", "B","D")
Plot all chromatograms.
plot(bpis, col= group_colors[raw_data$Class], xlim= c(0,200)) legend("topright", legend= c("A", "B", "D"), fill= group_colors)
plot(bpis, col= group_colors[raw_data$Class], xlim= c(84,90)) legend("topright", legend= c("A", "B", "D"), fill= group_colors)
Define the rt and m/z range of the peak area
rtr <- c(85, 88.5) mzr <- c(7.85, 183.80)
extract the chromatogram
chr_raw <- chromatogram(raw_data, mz = mzr, rt = rtr) plot(chr_raw, col = group_colors[chr_raw$Class])
raw_data %>% filterRt(rt = rtr) %>% filterMz(mz = mzr) %>% plot(type = "XIC")
cwp <- CentWaveParam(peakwidth = c(3.5, 5), noise = 500000) xdata <- findChromPeaks(raw_data, param = cwp)
xdata <- findChromPeaks(raw_data, param = cwp) Error: BiocParallel errors element index: 1, 2, 3, 4, 5, 6, ... first error: Spectra are not ordered by retention time! In addition: Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading
This the traceback
traceback() 6: stop(.error_bplist(res)) 5: bplapply(do.call("lapply", args), FUN = findChromPeaks_OnDiskMSnExp, method = "centWave", param = param, BPPARAM = BPPARAM) 4: bplapply(do.call("lapply", args), FUN = findChromPeaks_OnDiskMSnExp, method = "centWave", param = param, BPPARAM = BPPARAM) 3: .local(object, param, ...) 2: findChromPeaks(raw_data, param = cwp) 1: findChromPeaks(raw_data, param = cwp)
And the session info:
sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale: [1] LC_COLLATE=Spanish_Colombia.1252 LC_CTYPE=Spanish_Colombia.1252 LC_MONETARY=Spanish_Colombia.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Colombia.1252
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] speaq_2.5.0 ChemoSpec_5.0.229 ChemoSpecUtils_0.2.211 magrittr_1.5
[5] pheatmap_1.0.12 faahKO_1.22.0 msdata_0.22.0 RColorBrewer_1.1-2
[9] pander_0.6.3 xcms_3.4.4 BiocParallel_1.16.6 MSnbase_2.8.3
[13] ProtGenerics_1.14.0 S4Vectors_0.20.1 mzR_2.16.2 Rcpp_1.0.1
[17] Biobase_2.42.0 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] doParallel_1.0.14 httr_1.4.0 tools_3.5.0 R6_2.4.0
[5] affyio_1.52.0 lazyeval_0.2.2 colorspace_1.4-1 tidyselect_0.2.5
[9] gridExtra_2.3 mQTL_1.0 compiler_3.5.0 MassSpecWavelet_1.48.1
[13] preprocessCore_1.44.0 rvest_0.3.3 xml2_1.2.0 plotly_4.9.0
[17] scales_1.0.0 DEoptimR_1.0-8 robustbase_0.93-4 randomForest_4.6-14
[21] affy_1.60.0 stringr_1.4.0 digest_0.6.18 colorRamps_2.3
[25] pkgconfig_2.0.2 htmltools_0.3.6 itertools_0.1-3 limma_3.38.3
[29] htmlwidgets_1.3 rlang_0.3.4 rstudioapi_0.10 impute_1.56.0
[33] jsonlite_1.6 mzID_1.20.1 dplyr_0.8.0.1 qtl_1.44-9
[37] MetaboMate_0.0.0.9002 MALDIquant_1.19.3 Matrix_1.2-14 munsell_0.5.0
[41] RcppZiggurat_0.1.5 vsn_3.50.0 stringi_1.4.3 pROC_1.14.0
[45] yaml_2.2.0 MASS_7.3-49 zlibbioc_1.28.0 plyr_1.8.4
[49] grid_3.5.0 ggrepel_0.8.0 crayon_1.3.4 doSNOW_1.0.16
[53] lattice_0.20-38 splines_3.5.0 multtest_2.38.0 pillar_1.3.1
[57] reshape2_1.4.3 codetools_0.2-15 XML_3.98-1.19 glue_1.3.1
[61] outliers_0.14 pcaMethods_1.74.0 data.table_1.12.2 BiocManager_1.30.4
[65] nloptr_1.2.1 missForest_1.4 foreach_1.4.4 tidyr_0.8.3
[69] gtable_0.3.0 RANN_2.6.1 purrr_0.3.2 assertthat_0.2.1
[73] ggplot2_3.1.1 Rfast_1.9.3 viridisLite_0.3.0 ncdf4_1.16.1
[77] survival_2.41-3 tibble_2.1.1 snow_0.4-3 iterators_1.0.10
[81] ptw_1.9-13 IRanges_2.16.0 ellipse_0.4.1 cluster_2.0.7-1
Hi @davidgarciam ! Now that's an error I have not seen before. According to the error message the spectra in (one or some of) your data files are not ordered increasingly by retention time.
First let's find out which of your files are affected - then we can dig into that to see how the data looks like. So, can you please do the following and post the output?
tmp <- split(raw_data, f = fromFile(raw_data))
vapply(tmp, function(z) is.unsorted(rtime(z)), logical(1))
those with a TRUE
are the trouble makers as their retention times are not ordered.
Hi, @davidgarciam, in addition, you can look at the scan times. If you have a very fast scanning GC/MS, maybe some retention times are identical ? That still counts as non-monotonously increasing. If you convert to the mzData format with proteowizard msconvert, you can instruct the tool to re-order according to retention time. Yours, Steffen
Thanks for your comments professor @jorainer and @sneumann .
OK it seems that all the files but number 3 have problems with the order of rt:
tmp <- split(raw_data, f = fromFile(raw_data)) vapply(tmp, function(z) is.unsorted(rtime(z)), logical(1)) 1 2 3 4 5 6 7 8 TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
So, I need to order my data according to the rt, but it refers to the total length of the rt or the order of the rt itself inside each file?...
The length of the rt in the chromatograms are different:
a length_rt t1 t2 t3 t4 t5 t6 PMP-250-1 1444 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-250-2 1788 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-325-1 1239 6 6.49998 7.00002 7.5 7.99998 8.50002 PMP-325-2 1553 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-325-3 1286 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-370-1 517 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-370-2 526 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-370-3 1548 0 6.00000 7.00002 7.5 7.99998 8.50002
or the problem is here?:
head(rtime(bpis[1,1]))
F1.S0002 F1.S0001 F1.S0003 F1.S0004 F1.S0005 F1.S0006 0.00000 6.00000 7.00002 7.50000 7.99998 8.50002
head(rtime(bpis[1,2])) F2.S0002 F2.S0001 F2.S0003 F2.S0004 F2.S0005 F2.S0006 0.00000 6.00000 7.00002 7.50000 7.99998 8.50002 head(rtime(bpis[1,3])) F3.S0001 F3.S0002 F3.S0003 F3.S0004 F3.S0005 F3.S0006 6.00000 6.49998 7.00002 7.50000 7.99998 8.50002
Hi, I guess the RTs are somewhere not increasing. The first look fine: 6.00000 7.00002 7.5 7.99998 8.50002
,
but can you plot these 1444 numbers to see if there is something strange ? Can you check if the issue is that some scans have the same RT: which(duplicated())
? Can you paste the last six RTs, where this problem could loom ? Can you tell us why some files have three times (1788) as many scans as others ( 517) ? Yours, Steffen
Thank you @sneumann. So, I've tried what you suggested to:
plot these 1444 numbers to see if there is something strange ? They seem to be OK
Can you check if the issue is that some scans have the same RT: which(duplicated())?
which(duplicated(rtime(bpis[1,1])) integer(0) The same result for the others. There are no duplicated rt's
Can you paste the last six RTs, where this problem could loom? These are the tails, they seem to be OK.
a length_rt t1 t2 t3 t4 t5 t6 PMP-250-1 1444 725.0 725.5 726.0 726.5 727.0 727.5 PMP-250-2 1788 897.0 897.5 898.0 898.5 899.0 899.5 PMP-325-1 1239 622.5 623.0 623.5 624.0 624.5 625.0 PMP-325-2 1553 779.5 780.0 780.5 781.0 781.5 782.0 PMP-325-3 1286 646.0 646.5 647.0 647.5 648.0 648.5 PMP-370-1 517 261.5 262.0 262.5 263.0 263.5 264.0 PMP-370-2 526 266.0 266.5 267.0 267.5 268.0 268.5 PMP-370-3 1548 777.0 777.5 778.0 778.5 779.0 779.5
Can you tell us why some files have three times (1788) as many scans as others ( 517)? I didn't obtain the data by myself, they were provided by a colleague. I guess he used more scans in some runs than in others. Since I'm exploring his data, I was surprised too by this fact.
Thanks!
Ok, then my last idea would be to check which(rtime(bpis[1,1])) != sorted(rtime(bpis[1,1])))
to see where the sorting breaks. You can also make one affected file available and I could have a quick look.
Yours, Steffen
Just a note: in principle it should also be possible to sort the spectra in the OnDiskMSnExp
object (i.e. your raw_data
).
Well, First I detected the affected files as suggested by prof. @jorainer
tmp <- split(raw_data, f = fromFile(raw_data)) vapply(tmp, function(z) is.unsorted(rtime(z)), logical(1)) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
Previously, I didn't find duplicated rt's. So I tried to check the rt order in one of the affected files using prof. @sneumann idea:
which(rtime(bpis[1,1]) != sort(rtime(bpis[1,1])) named integer(0)
So they seem to be ordered...
Prof. @jorainer I'm not sure of how to sort the spectra data in the object... I have to extract the rtime, sort it, and return them to the object?.
These are some of the files, 1 affected and the other 2 normals.
Thank you so much for your constant replies and help...
Hi, The second scan in the files has an RT of 0.0:
grep TimeInMinutes PMP-100-CUESCO-1.mzData
...
<cvParam cvLabel="psi" accession="PSI:1000038" name="TimeInMinutes" value="0.100000"></cvParam>
...
<cvParam cvLabel="psi" accession="PSI:1000038" name="TimeInMinutes" value="0.000000"></cvParam>
not sure how this gets there. Yours, Steffen
The other thing where we might need input from @jorainer is
Browse[2]> head(sort(rt))
F1.S0001 F2.S0001 F1.S0002 F2.S0002 F1.S0003 F2.S0003
6.00000 6.00000 6.49998 6.49998 7.00002 7.00002
in particular, what are F1.X
and F2.X
?
Yours, Steffen
F1.X
denotes spectra from file one and F2.X
spectra from file two, e.g. F1.S0001
means first spectrum from file one and F1.S0002
second spectrum and so on...
Regarding how to sort, I can help there, but we might first want to ensure that the input data is correct. As @sneumann points out, how can you have retention times of 0 in your files? That might be an issue.
Now, if only the 0
retention times are a problem you could filter your object by retention time keeping spectra with a retention time > 0:
raw_data <- filterRt(raw_data, c(0.1, max(rtime(raw_data))))
and then try to run the analysis.