Michael Macias comments

Results 68 comments of


                                            Michael Macias

[feature] Pileup Engine

Nice, this is a great initiative! Let's include something like this. Even though, as you mentioned, it's more algorithmic than I/O, pileup is a fairly common operation and is likely...

vcf: is an individual INFO field with missing value valid

Thanks for the clarification. I found that this case was previously omitted in https://github.com/samtools/hts-specs/pull/496 due to the same confusion, so I agree it should be defined in the spec.

Region query on the bgzipped indexed fasta?

noodles-fasta can now seek/query bgzipped FASTA files. See [`fasta::IndexedReader`](https://docs.rs/noodles/0.29.0/noodles/fasta/struct.IndexedReader.html) and the [`fasta_query`](https://github.com/zaeleus/noodles/blob/e8c5fe9b1c9d4cd4f8801936ea6692397c8c589c/noodles-fasta/examples/fasta_query.rs) example.

Header parsing too strict?

In this case, noodles' SAM header parser is not overly strict. It is, however, spec-compliant. From [Sequence Alignment/Map Format Specification (2022-08-22) § 1.3 "The header section"](https://samtools.github.io/hts-specs/SAMv1.pdf#subsection.1.3): > Platform/technology used to...

Header parsing too strict?

If the platform field value is the only blocker when reading, I would suggest preprocessing the raw SAM header before parsing, e.g., [1) just `illumina` or 2) generalized](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=116ec74955534e77c11c89242701c414). > do...

Implement region query for the reader of bgzipped indexed fasta

Sorry for the long response time, and thanks for your patience. I don't think this is the best approach to this problem. Readers in noodles are largely agnostic to indices,...

Implement region query for the reader of bgzipped indexed fasta

Thanks for your interest and possible solution. A different approach was implemented by delegating to a seekable raw reader. See `{bgzf,fasta}::IndexedReader`.

Transparently read/write SAM vs. BAM vs. CRAM

Thanks for testing, @jkbonfield. 1) There's been no work to select more appropriate/optimal codecs for data series, so the current implementation will simply use gzip for all block data. There...

"invalid genotypes: invalid genotype: empty input"

I'm closing this as stale. The genotypes parser has been improved since this issue, so do tell if you're still receiving the error.

bam/writer, sam/writer: Validate cigar/sequence/basequalities

Thanks for looking at this in the past. I'm closing this since the alignment parsers/writers have now diverged greatly and makes the same checks.