Joshua Charkow comments

Results 73 comments of


                                            Joshua Charkow

Patch/extend osw output

@singjc Can you please test that this works as expected with UIS? (e.g. no data is missing from the .osw file)?

Patch/extend osw output

@timosachsenberg I think this is ready for merging? I have not had a chance to review with @hroest but this does not change any functionality just outputs more information so...

Round 2 for ZSTD compression

Thanks for looping me into the discussion. Results look promising, especially because the file is both smaller and runtime is reduced as well! Just a few requests: It would be...

> [@jcharkow](https://github.com/jcharkow) I have read the `mspack` paper (10.1093/bioinformatics/btab636). This appears to combine out-of-band data that is compressed and standard `gzip` compression of the XML metadata. This is pretty much...

Round 2 for ZSTD compression

I've done a benchmark with msnumpress on my end for the same file that you benchmarked with above. m/z array is 64-bit float with MS-Numpress linear prediction compression followed by...

Round 2 for ZSTD compression

Thanks for doing these benchmarks. The dictionary encoding for ion mobility sounds like a good idea that would be beneficial.

Round 2 for ZSTD compression

Interestingly, it seems that the improvements in terms of timing for Numpress+Zstd over Numpress+zLib is much more pronounced for thermo data than bruker data. @mobiusklein when summarizing everything it might...

Round 2 for ZSTD compression

Just clarrifying the `ion mobility` graph for the thermo data is actually `m/z`? Interesting how we see different trends. Interesting how we see different trends based on different datatypes however...

New Workflow: LDA then XGBoost

Doing eFDR / FDR identification curves with different workflows we can se that LDA_XGBoost is quite similar to XGBoost in terms of overfitting and actually overfits slightly less than XGBoost...

Feature/speedup parquet export

Currently, this method only supports a combined output file (no split based on runs) and no IPF