Cornelius Roemer

Results 325 comments of Cornelius Roemer

Vim should definitely be disabled in jupyter so that one can at least use emacs keybindings for things like `ctrl+e` etc

As discussed here this sounds like it requires an Augur solution, see nextstrain/augur#1068 for a recent duplicate.

Oh nice, I didn't know about the Euro mirror, that might be a better option for me anyways! https://genome-euro.ucsc.edu/cgi-bin/hgPhyloPlace This seems to work!

Hmm ok it fails soon after submitting sequences:

Very cool! I always use dev.usher.bio anyways so don't mind if it's released or not ;)

Workaround using seqkit for the GISAID case (keeping second field as id): ``` seqkit seq \ --id-regexp "^.*\|([^\|]+)\|" \ -i \ {input.fasta} ```

Here we go! https://github.com/apache/arrow-rs/issues/1059 OK, duckdb just lost because they also require insane amounts of memory. Your implementation is super lean, on track to be the winner if we can...

Oh duckdb just seems to read everything into memory, then write out rather than stream. It's kind of a hack how I'm using it for ETL - so not surprising...

Ok that seekable thing doesn't work. So back to inferring schema when reading from stdin. One could multipeek on the stdin reader for a few lines, collect into a vec,...

As a workaround (for now) I can just do the parquet conversion on a cluster where I have TB of disk space. Should be possible to feed the 100GB file...