khmer
khmer copied to clipboard
Test and document broken_paired_reader
I am considering integrating the broken_paired_reader
into kevlar (see https://github.com/dib-lab/kevlar/issues/207). I understand at a high level what it's intended to do, but I don't understand the specifics of how it actually works or the circumstances under which it will or will not fail.
We have a couple of relevant tests in test_read_handling.py
, but these test things at the functional/script level. I suggest we need:
- [ ] documentation for the broken paired reader
- its intended purpose
- specifics of what will and will not work
- [ ] some clear unit tests that invoke the
broken_paired_reader
function directly to enforce the advertised behavior
The current docstring actually documents the function pretty well.
"""Read pairs from a stream.
A generator that yields singletons and pairs from a stream of FASTA/FASTQ
records (yielded by 'screed_iter'). Yields (n, is_pair, r1, r2) where
'r2' is None if is_pair is False.
The input stream can be fully single-ended reads, interleaved paired-end
reads, or paired-end reads with orphans, a.k.a. "broken paired".
Usage::
for n, is_pair, read1, read2 in broken_paired_reader(...):
...
Note that 'n' behaves like enumerate() and starts at 0, but tracks
the number of records read from the input stream, so is
incremented by 2 for a pair of reads.
If 'min_length' is set, all reads under this length are ignored (even
if they are pairs).
If 'force_single' is True, all reads are returned as singletons.
"""
I guess we can still consider integrating this into the developer docs.