MassQueryLanguage
MassQueryLanguage copied to clipboard
MALDI and MSQL
It's turning out to be quite difficult to get MALDI data into a format that works with MSQL.
Note: I don't currently have access to any Vendor software to see what it can export.
My first go-to's for working with MALDI data (and what I recommend for others) are mmass (http://www.mmass.org/) and MALDIquant (https://github.com/sgibb/MALDIquant)
MALDIquant
I tried peak picking and exporting with MALDIquant but it can't export peaks into mzml/mzxml. It can export csv/tsv but MSQL doesn't have a parser for those. MALDIquant error:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘exportMzMl’ for signature ‘"MassPeaks"’
I forced an export by changing to a "massSpectrum" object. But get the following error from MSQL when trying to query that file:
Namespace(cache='NO', extract_json=None, extract_mzML=None, filename='/home/chase/delet/trial.mzML', original_path=None, output_file=None, parallel_query='NO', query='QUERY scaninfo(MS1DATA)')
{
"querytype": {
"function": "functionscaninfo",
"datatype": "datams1data"
},
"conditions": [],
"query": "QUERY scaninfo(MS1DATA)"
}
[Warning] Not index found and build_index_from_scratch is False
0it [00:00, ?it/s]
Traceback (most recent call last):
File "workflow/bin/msql_cmd.py", line 115, in <module>
main()
File "workflow/bin/msql_cmd.py", line 44, in main
results_df = msql_engine.process_query(query,
File "/home/chase/Documents/github/MassQueryLanguage/msql_engine.py", line 176, in process_query
return _evalute_variable_query(parsed_dict, input_filename, cache=cache, parallel=parallel)
File "/home/chase/Documents/github/MassQueryLanguage/msql_engine.py", line 252, in _evalute_variable_query
ms1_df, ms2_df = msql_fileloading.load_data(input_filename, cache=cache)
File "/home/chase/Documents/github/MassQueryLanguage/msql_fileloading.py", line 41, in load_data
ms1_df, ms2_df = _load_data_mzML2(input_filename)
File "/home/chase/Documents/github/MassQueryLanguage/msql_fileloading.py", line 290, in _load_data_mzML2
rt = spec.scan_time_in_minutes()
File "/home/chase/miniconda3/envs/msql/lib/python3.8/site-packages/pymzml/spec.py", line 885, in scan_time_in_minutes
self._scan_time, time_unit = self.scan_time
File "/home/chase/miniconda3/envs/msql/lib/python3.8/site-packages/pymzml/spec.py", line 869, in scan_time
self._scan_time = float(scan_time_ele.attrib.get("value"))
AttributeError: 'NoneType' object has no attribute 'attrib'
mmass
Imported mzml spectrum, did peak-picking in mmass and then attempted mgf export (it's either csv or mgf). But trying to query with MSQL:
Namespace(cache='NO', extract_json=None, extract_mzML=None, filename='/home/chase/delet/massive.ucsd.edu/MSV000084291/MSV000081619/bs3610_a_2.mgf', original_path=None, output_file=None, parallel_query='NO', query='QUERY scaninfo(MS1DATA)')
{
"querytype": {
"function": "functionscaninfo",
"datatype": "datams1data"
},
"conditions": [],
"query": "QUERY scaninfo(MS1DATA)"
}
Traceback (most recent call last):
File "workflow/bin/msql_cmd.py", line 115, in <module>
main()
File "workflow/bin/msql_cmd.py", line 44, in main
results_df = msql_engine.process_query(query,
File "/home/chase/Documents/github/MassQueryLanguage/msql_engine.py", line 176, in process_query
return _evalute_variable_query(parsed_dict, input_filename, cache=cache, parallel=parallel)
File "/home/chase/Documents/github/MassQueryLanguage/msql_engine.py", line 252, in _evalute_variable_query
ms1_df, ms2_df = msql_fileloading.load_data(input_filename, cache=cache)
File "/home/chase/Documents/github/MassQueryLanguage/msql_fileloading.py", line 50, in load_data
ms1_df, ms2_df = _load_data_mgf(input_filename)
File "/home/chase/Documents/github/MassQueryLanguage/msql_fileloading.py", line 89, in _load_data_mgf
peak_dict["scan"] = spectrum.metadata["scans"]
KeyError: 'scans'```
Why not use imzML? It is very similar to mzML and adds imaging-specific information to it.
I was trying to work within the confines of formats that already had parsers in MassQueryLanguage. Yesterday @mwang87 got mgf working from mmass exports. But I agree with you @robinschmid, it would also be useful to have for imaging.
Naive question- I assume imzML supports centroided data?
its based on mzML and supports centroid data: https://ms-imaging.org/wp/imzml/
there is jimzML parser and the pyimzml parser (https://github.com/alexandrovteam/pyimzML)
Do you all have any example queries and data you'd want to use as an example? If so then it should be reasonable to add support. I just don't have any data in that format.
Corinna might have some interesting imaging data with
- MS1
- tims - MS1
- tims all ion fragmentation