MSnbase
MSnbase copied to clipboard
Error in reading Agilent .d files converted to .mzML via msconvert
Hello,
I have some Agilent .d files that I converted to .mzML on a linux server with msconvert installed via docker and wine.
All other file types convert to .mzML with no issues.
when I try to read in the Agilent .mzML files via readMSData, I get this error:
Error: Can not open file 0714_48mix_50uM_02.mzML! Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Unknown binary data type.
I've seen references to this error, but no solutions.
This error occurs both on the linux server as well as on my Mac.
I tried removing the <binaryData* tags, but then that gave me an istream error.
packageVersion("MSnbase") [1] ‘2.18.0’
I tried to update MSnbase, but it keeps installing this version.
`> sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Mojave 10.14.4
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages:
[1] RColorBrewer_1.1-2 magrittr_2.0.1 MSnbase_2.18.0 ProtGenerics_1.24.0
[5] S4Vectors_0.30.0 Biobase_2.52.0 BiocGenerics_0.38.0 mzR_2.26.1
[9] Rcpp_1.0.7 MASS_7.3-54
loaded via a namespace (and not attached):
[1] plyr_1.8.6 compiler_4.1.1 pillar_1.6.2
[4] BiocManager_1.30.16 iterators_1.0.13 zlibbioc_1.38.0
[7] tools_4.1.1 digest_0.6.27 ncdf4_1.17
[10] MALDIquant_1.20 preprocessCore_1.54.0 lifecycle_1.0.0
[13] tibble_3.1.4 gtable_0.3.0 lattice_0.20-44
[16] clue_0.3-59 pkgconfig_2.0.3 rlang_0.4.11
[19] foreach_1.5.1 cluster_2.1.2 IRanges_2.26.0
[22] vctrs_0.3.8 MsCoreUtils_1.4.0 grid_4.1.1
[25] glue_1.4.2 impute_1.66.0 R6_2.5.1
[28] fansi_0.5.0 XML_3.99-0.7 BiocParallel_1.26.2
[31] limma_3.48.3 ggplot2_3.3.5 scales_1.1.1
[34] pcaMethods_1.84.0 codetools_0.2-18 ellipsis_0.3.2
[37] mzID_1.30.0 colorspace_2.0-2 utf8_1.2.2
[40] affy_1.70.0 doParallel_1.0.16 munsell_0.5.0
[43] vsn_3.60.0 crayon_1.4.1 affyio_1.62.0 `
When you say all other files, do you mean that one/some file(s) or the same series (same acquisitions/conversions) fail and other work?
Sorry I was unclear.
“All other files” mean files from other instruments (Waters, specifically).
Ok, thank you.
I am not sure that there's anything that can be done on the MSnbase
side here, and I do not have any experience with Agilent data. Maybe @jorainer has?
-
And is this a standard run? DDA or DIA? Anything special that could confuse
mzR
's parsing? -
A possible explanation is that your
pwiz
'smsconvert
version is too recent compared to thepwiz
code base inmzR
. Have you converted and read other Agilent data converted with that samemscovert
version?
I guess it's most likely the second point above. I've already seen a case where an additional binary array was added to the TIC in the mzML - could you maybe have a look into one of your mzML and see if you have a "non-standard data array" in there?
If that's the case you could try to avoid exporting the TIC to the mzML (e.g. with --chromatogramFilter "index[1-]"
in the msconvert
call).
Ultimately, it would be good if someone could have a go at updating mzR
- we definitely need the new proteowizard libraries in there - but my C++ knowledge is too limited to do that - maybe you @lgatto ?
I have little C++ skills and even less time, but hopefully, with a bit collective knowledge, we will get there.
Thank you both for your replies.
These were the first Agilent files I tried to convert (there were no warnings or anything from msconvert
). Also, the converted mzML files were readable in Skyline. Not sure if that offers you any clues.
I’m not sure if it was a standard run or not, but I’ll ask. The mzML files definitely had the “binaryData” that was mentioned in previous issues.
I will try the export without TIC and see if that results in a readable file…though, I don’t know what that would do to the rest of my pipeline.
Thanks again, I appreciate your help!
james
The Agilent runs were QQQ MRM.
For MRM data msconvert --chromatogramFilter "index[1-]"
should fix the problem as it will export all chromatograms except the TIC. Alternatively, you could simply delete the one entry:
open the converted mzML file with an editor and delete that entry, i.e. delete everything (including) from
<binaryDataArray arrayLength="...
until the next (but including):
</binaryDataArray>
Also, you should change the number of arrays for the TIC from 3 to 2 then, i.e. search for "total ion current" (that should be way before the lines that you deleted above) and change the
<binaryDataArrayList count="3">
to <binaryDataArrayList count="2">
Thank you for the suggestions!
I'm waiting for my IT folks to get docker up and running again, so I can test the conversion filter.
I did remove the <binaryDataArray> tags, but then that yielded a different error when reading in, but I will try your method, as well.
Removing the <binaryDataArray arrayLength="...
(there was only one section in the mzML file that had this) and changing the <binaryDataArrayList count="3">
to <binaryDataArrayList count="2">
(there was also only one) yielded the stream error:
Error: Can not open file [...] 0714_48mix_50uM_02.mzML! Original error was: Error in pwizModule$open(filename): [SpectrumList_mzML::create()] Bad istream.
But this is still on the mzML file that was converted WITH the TIC.
So, I'll wait until I can get the files converted without the TIC and try again.
Thanks so much for your help! james
If I got you correctly, the data is from an MRM experiment, so the mzML file should only have chromatograms, but no spectra in it. If that's the case, you should read the files with readSRMData
and not with readMSData
.
noTIC is the full path to the file.
noticdata <- readSRMData(noTIC, pdata = NULL) Error: Can not open file /Users/graham/Documents/LCMS/KATIE/noTIC/0714_48mix_50uM_02.mzML! Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Unknown binary data type.
I converted the .d file to mzML via:
docker run -it --rm -e WINEDEBUG=-all -v /path/to/data:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert 0714_48mix_50uM_02.d --mzML --chromatogramFilter "index[1-]" -o output3
`format: mzML
m/z: Compression-None, 64-bit
intensity: Compression-None, 32-bit
rt: Compression-None, 64-bit
ByteOrder_LittleEndian
indexed="true"
outputPath: output3
extension: .mzML
contactFilename:
runIndexSet:
spectrum list filters:
chromatogram list filters: index[1-]
filenames: 0714_48mix_50uM_02.d
processing file: 0714_48mix_50uM_02.d calculating source file checksums [ChromatogramListFactory] Ignoring wrapper: index[1-] writing output file: output3\0714_48mix_50uM_02.mzML`
This yields the same binary data error.
Attached is the converted mzML file using the --chromatogramFilter "index[1-]"
filter.
Hm, what puzzles me is that the file above still contains the TIC with the "non-standard data array" binary data type. could you maybe use --chromatogramFilter "index[2-]"
? just to see if we get rid of the TIC in that way...
docker run -it --rm -e WINEDEBUG=-all -v /mnt/m176906/KATIE:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert 0714_48mix_50uM_02.d --mzML --chromatogramFilter "index[2-]" -o output4
m/z: Compression-None, 64-bit
intensity: Compression-None, 32-bit
rt: Compression-None, 64-bit
ByteOrder_LittleEndian
indexed="true"
outputPath: output4
extension: .mzML
contactFilename:
runIndexSet:
spectrum list filters:
chromatogram list filters:
index[2-]
filenames:
0714_48mix_50uM_02.d
processing file: 0714_48mix_50uM_02.d
calculating source file checksums
[ChromatogramListFactory] Ignoring wrapper: index[2-]
writing output file: output4\0714_48mix_50uM_02.mzML```
[0714_48mix_50uM_02.mzML.zip](https://github.com/lgatto/MSnbase/files/7136916/0714_48mix_50uM_02.mzML.zip)
Not sure what happened to the mzML file I attached...
Hm, seems the link to the file is within ``` (i.e. formatted as code) - can you please add it again?
There you go.
The TIC and the problematic data array is still in this file - this is for sure the correct file you sent me? seems that msconvert is not applying the filter (although it shows it).
Yeah, it was the “converted” file.
Do you know what the “ignoring wrapper” part in The output means?
Ah, I've overlooked that before. Seems that msconvert
is ignoring this filter? Maybe try with --chromatogramFilter "1-"
instead? Problem is that the chromatogram filters are not documented (or at least I did not find a documentation for them).
I tried with 1- and 2- and both yielded:
[ChromatogramListFactory] Ignoring wrapper: 1-
[ChromatogramListFactory] Ignoring wrapper: 2-
I also tried it with the following syntax:
--chromatogramFilter "index[2-]"
Which also yielded the "ignoring wrapper" warning.
Attached is a tarball of the two files, but they are the same size, so the filters are likely not working.
Hm, but then it seems that there is a problem with msconvert
.
After trying myself I think the problem is a missing whitespace in the filter definition. It should be --chromatogramFilter "index [1,]"
. Sorry for that.
That worked! Thank you.
Two mzML files below: index1 and index2.
I'll try reading them in myself in a bit...
I know this isn't an msconvert forum, but:
Is the --chromatogramFilter "index [1,]"
flag only an option in the msconvert command line version?
I tried with the GUI version and did not see any of the chromatogram filters (just wanted to control for something wrong with the command line version installed).
Honestly, I've no idea. I'm only using the command line version from the docker image. The problem also is that the documentation on the chromatogram filters is pretty scarse.
Yeah, I've found the same.
I do very much appreciate your efforts, thank you.
The developmental mzR
version with an updated proteowizard code is available. With this version it should be possible to read the mzML files. It might take some time until this version becomes "stable" because we had to remove the ramp
backend and hence mzData support. To install:
BiocManager::install("sneumann/mzR", ref = "feature/updatePwiz_3_0_21263")