MSnbase icon indicating copy to clipboard operation
MSnbase copied to clipboard

Error in reading Agilent .d files converted to .mzML via msconvert

Open jamesrgraham opened this issue 3 years ago • 42 comments

Hello,

I have some Agilent .d files that I converted to .mzML on a linux server with msconvert installed via docker and wine.

All other file types convert to .mzML with no issues.

when I try to read in the Agilent .mzML files via readMSData, I get this error:

Error: Can not open file 0714_48mix_50uM_02.mzML! Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Unknown binary data type.

I've seen references to this error, but no solutions.

This error occurs both on the linux server as well as on my Mac.

I tried removing the <binaryData* tags, but then that gave me an istream error.

packageVersion("MSnbase") [1] ‘2.18.0’

I tried to update MSnbase, but it keeps installing this version.

`> sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Mojave 10.14.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base

other attached packages: [1] RColorBrewer_1.1-2 magrittr_2.0.1 MSnbase_2.18.0 ProtGenerics_1.24.0 [5] S4Vectors_0.30.0 Biobase_2.52.0 BiocGenerics_0.38.0 mzR_2.26.1
[9] Rcpp_1.0.7 MASS_7.3-54

loaded via a namespace (and not attached): [1] plyr_1.8.6 compiler_4.1.1 pillar_1.6.2
[4] BiocManager_1.30.16 iterators_1.0.13 zlibbioc_1.38.0
[7] tools_4.1.1 digest_0.6.27 ncdf4_1.17
[10] MALDIquant_1.20 preprocessCore_1.54.0 lifecycle_1.0.0
[13] tibble_3.1.4 gtable_0.3.0 lattice_0.20-44
[16] clue_0.3-59 pkgconfig_2.0.3 rlang_0.4.11
[19] foreach_1.5.1 cluster_2.1.2 IRanges_2.26.0
[22] vctrs_0.3.8 MsCoreUtils_1.4.0 grid_4.1.1
[25] glue_1.4.2 impute_1.66.0 R6_2.5.1
[28] fansi_0.5.0 XML_3.99-0.7 BiocParallel_1.26.2
[31] limma_3.48.3 ggplot2_3.3.5 scales_1.1.1
[34] pcaMethods_1.84.0 codetools_0.2-18 ellipsis_0.3.2
[37] mzID_1.30.0 colorspace_2.0-2 utf8_1.2.2
[40] affy_1.70.0 doParallel_1.0.16 munsell_0.5.0
[43] vsn_3.60.0 crayon_1.4.1 affyio_1.62.0 `

jamesrgraham avatar Aug 31 '21 18:08 jamesrgraham

When you say all other files, do you mean that one/some file(s) or the same series (same acquisitions/conversions) fail and other work?

lgatto avatar Aug 31 '21 18:08 lgatto

Sorry I was unclear.

“All other files” mean files from other instruments (Waters, specifically).

jamesrgraham avatar Aug 31 '21 21:08 jamesrgraham

Ok, thank you.

I am not sure that there's anything that can be done on the MSnbase side here, and I do not have any experience with Agilent data. Maybe @jorainer has?

  • And is this a standard run? DDA or DIA? Anything special that could confuse mzR's parsing?

  • A possible explanation is that your pwiz's msconvert version is too recent compared to the pwiz code base in mzR. Have you converted and read other Agilent data converted with that same mscovert version?

lgatto avatar Sep 01 '21 05:09 lgatto

I guess it's most likely the second point above. I've already seen a case where an additional binary array was added to the TIC in the mzML - could you maybe have a look into one of your mzML and see if you have a "non-standard data array" in there?

If that's the case you could try to avoid exporting the TIC to the mzML (e.g. with --chromatogramFilter "index[1-]" in the msconvert call).

Ultimately, it would be good if someone could have a go at updating mzR - we definitely need the new proteowizard libraries in there - but my C++ knowledge is too limited to do that - maybe you @lgatto ?

jorainer avatar Sep 01 '21 06:09 jorainer

I have little C++ skills and even less time, but hopefully, with a bit collective knowledge, we will get there.

lgatto avatar Sep 01 '21 07:09 lgatto

Thank you both for your replies.

These were the first Agilent files I tried to convert (there were no warnings or anything from msconvert). Also, the converted mzML files were readable in Skyline. Not sure if that offers you any clues.

I’m not sure if it was a standard run or not, but I’ll ask. The mzML files definitely had the “binaryData” that was mentioned in previous issues.

I will try the export without TIC and see if that results in a readable file…though, I don’t know what that would do to the rest of my pipeline.

Thanks again, I appreciate your help!

james

jamesrgraham avatar Sep 01 '21 14:09 jamesrgraham

The Agilent runs were QQQ MRM.

jamesrgraham avatar Sep 01 '21 15:09 jamesrgraham

For MRM data msconvert --chromatogramFilter "index[1-]" should fix the problem as it will export all chromatograms except the TIC. Alternatively, you could simply delete the one entry:

open the converted mzML file with an editor and delete that entry, i.e. delete everything (including) from

<binaryDataArray arrayLength="...

until the next (but including): </binaryDataArray>

Also, you should change the number of arrays for the TIC from 3 to 2 then, i.e. search for "total ion current" (that should be way before the lines that you deleted above) and change the

<binaryDataArrayList count="3"> to <binaryDataArrayList count="2">

jorainer avatar Sep 06 '21 10:09 jorainer

Thank you for the suggestions!

I'm waiting for my IT folks to get docker up and running again, so I can test the conversion filter.

I did remove the <binaryDataArray> tags, but then that yielded a different error when reading in, but I will try your method, as well.

jamesrgraham avatar Sep 07 '21 13:09 jamesrgraham

Removing the <binaryDataArray arrayLength="... (there was only one section in the mzML file that had this) and changing the <binaryDataArrayList count="3"> to <binaryDataArrayList count="2"> (there was also only one) yielded the stream error:

Error: Can not open file [...] 0714_48mix_50uM_02.mzML! Original error was: Error in pwizModule$open(filename): [SpectrumList_mzML::create()] Bad istream.

But this is still on the mzML file that was converted WITH the TIC.

So, I'll wait until I can get the files converted without the TIC and try again.

Thanks so much for your help! james

jamesrgraham avatar Sep 07 '21 13:09 jamesrgraham

If I got you correctly, the data is from an MRM experiment, so the mzML file should only have chromatograms, but no spectra in it. If that's the case, you should read the files with readSRMData and not with readMSData.

jorainer avatar Sep 08 '21 10:09 jorainer

noTIC is the full path to the file.

noticdata <- readSRMData(noTIC, pdata = NULL) Error: Can not open file /Users/graham/Documents/LCMS/KATIE/noTIC/0714_48mix_50uM_02.mzML! Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Unknown binary data type.

I converted the .d file to mzML via:

docker run -it --rm -e WINEDEBUG=-all -v /path/to/data:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert 0714_48mix_50uM_02.d --mzML --chromatogramFilter "index[1-]" -o output3 `format: mzML m/z: Compression-None, 64-bit intensity: Compression-None, 32-bit rt: Compression-None, 64-bit ByteOrder_LittleEndian indexed="true" outputPath: output3 extension: .mzML contactFilename: runIndexSet:

spectrum list filters:

chromatogram list filters: index[1-]

filenames: 0714_48mix_50uM_02.d

processing file: 0714_48mix_50uM_02.d calculating source file checksums [ChromatogramListFactory] Ignoring wrapper: index[1-] writing output file: output3\0714_48mix_50uM_02.mzML`

This yields the same binary data error.

jamesrgraham avatar Sep 08 '21 18:09 jamesrgraham

0714_48mix_50uM_02.mzML.zip

Attached is the converted mzML file using the --chromatogramFilter "index[1-]" filter.

jamesrgraham avatar Sep 08 '21 19:09 jamesrgraham

Hm, what puzzles me is that the file above still contains the TIC with the "non-standard data array" binary data type. could you maybe use --chromatogramFilter "index[2-]"? just to see if we get rid of the TIC in that way...

jorainer avatar Sep 09 '21 06:09 jorainer

docker run -it --rm -e WINEDEBUG=-all -v /mnt/m176906/KATIE:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert 0714_48mix_50uM_02.d --mzML --chromatogramFilter "index[2-]" -o output4

    m/z: Compression-None, 64-bit
    intensity: Compression-None, 32-bit
    rt: Compression-None, 64-bit
ByteOrder_LittleEndian
 indexed="true"
outputPath: output4
extension: .mzML
contactFilename:
runIndexSet:

spectrum list filters:

chromatogram list filters:
  index[2-]

filenames:
  0714_48mix_50uM_02.d

processing file: 0714_48mix_50uM_02.d
calculating source file checksums
[ChromatogramListFactory] Ignoring wrapper: index[2-]
writing output file: output4\0714_48mix_50uM_02.mzML```


[0714_48mix_50uM_02.mzML.zip](https://github.com/lgatto/MSnbase/files/7136916/0714_48mix_50uM_02.mzML.zip)

jamesrgraham avatar Sep 09 '21 13:09 jamesrgraham

Not sure what happened to the mzML file I attached...

jamesrgraham avatar Sep 09 '21 13:09 jamesrgraham

Hm, seems the link to the file is within ``` (i.e. formatted as code) - can you please add it again?

jorainer avatar Sep 09 '21 13:09 jorainer

There you go.

jamesrgraham avatar Sep 09 '21 13:09 jamesrgraham

The TIC and the problematic data array is still in this file - this is for sure the correct file you sent me? seems that msconvert is not applying the filter (although it shows it).

jorainer avatar Sep 10 '21 08:09 jorainer

Yeah, it was the “converted” file.

Do you know what the “ignoring wrapper” part in The output means?

jamesrgraham avatar Sep 10 '21 16:09 jamesrgraham

Ah, I've overlooked that before. Seems that msconvert is ignoring this filter? Maybe try with --chromatogramFilter "1-" instead? Problem is that the chromatogram filters are not documented (or at least I did not find a documentation for them).

jorainer avatar Sep 13 '21 12:09 jorainer

I tried with 1- and 2- and both yielded:

[ChromatogramListFactory] Ignoring wrapper: 1-

[ChromatogramListFactory] Ignoring wrapper: 2-

I also tried it with the following syntax:

--chromatogramFilter "index[2-]"

Which also yielded the "ignoring wrapper" warning.

Attached is a tarball of the two files, but they are the same size, so the filters are likely not working.

filter1_2.tar.gz

jamesrgraham avatar Sep 13 '21 18:09 jamesrgraham

Hm, but then it seems that there is a problem with msconvert.

jorainer avatar Sep 14 '21 09:09 jorainer

After trying myself I think the problem is a missing whitespace in the filter definition. It should be --chromatogramFilter "index [1,]". Sorry for that.

jorainer avatar Sep 14 '21 10:09 jorainer

That worked! Thank you.

Two mzML files below: index1 and index2.

I'll try reading them in myself in a bit...

index12.tar.gz

jamesrgraham avatar Sep 14 '21 13:09 jamesrgraham

I know this isn't an msconvert forum, but:

Is the --chromatogramFilter "index [1,]" flag only an option in the msconvert command line version?

I tried with the GUI version and did not see any of the chromatogram filters (just wanted to control for something wrong with the command line version installed).

jamesrgraham avatar Sep 17 '21 14:09 jamesrgraham

Honestly, I've no idea. I'm only using the command line version from the docker image. The problem also is that the documentation on the chromatogram filters is pretty scarse.

jorainer avatar Sep 17 '21 14:09 jorainer

Yeah, I've found the same.

I do very much appreciate your efforts, thank you.

jamesrgraham avatar Sep 17 '21 14:09 jamesrgraham

The developmental mzR version with an updated proteowizard code is available. With this version it should be possible to read the mzML files. It might take some time until this version becomes "stable" because we had to remove the ramp backend and hence mzData support. To install:

BiocManager::install("sneumann/mzR", ref = "feature/updatePwiz_3_0_21263")

jorainer avatar Sep 27 '21 08:09 jorainer