Shaun Jackman
Shaun Jackman
I'd suggest picking the lexicographically smallest possible nucleotide for that ambiguity code rather than a random one, to make the result deterministic.
What's your use case, Cole? Is it that you have reads with `Ns` in them, or do you have reads with other IUPAC codes in them, or are you working...
Jared (@jts) is in a better position to answer that question than myself.
Sounds liked `sed 's/ BX:Z:/:/` would convert from Longranger basic FASTQ format to the format that EMA expects. Would you consider adding support for `BX:Z` format?
``` preproc: preprocess barcoded FASTQ files (takes interleaved FASTQ via stdin) -h: apply Hamming-2 correction [off] ``` Cool! That's useful to me. I'm curious why `-h` is disabled by default....
What is the output format of `ema preproc`? Would you consider adding an option to output `BX:Z` format?
Ah, I think I misunderstood. The default then is Hamming-1 correction? I had incorrectly assumed that the default is no correction. Perhaps you could update the README.md to clarify which.
> ema preproc produces a special output format that isn't quite FASTQ; it puts everything for a read pair on a single line, which is convenient. That format is sometimes...
Yes, the `outs/barcoded.fastq.gz` of `longranger basic` does appear to be sorted by `BX:Z`! I hadn't noticed that before! I've been running `samtools sort -tBX` to sort by barcode! Hah. Thanks...
Does `ema preproc` sort by barcode by default? Assuming not, can the output of `ema preproc` be piped into `samtools sort -tBX`?