khmer icon indicating copy to clipboard operation
khmer copied to clipboard

Test and document broken_paired_reader

Open standage opened this issue 7 years ago • 1 comments

I am considering integrating the broken_paired_reader into kevlar (see https://github.com/dib-lab/kevlar/issues/207). I understand at a high level what it's intended to do, but I don't understand the specifics of how it actually works or the circumstances under which it will or will not fail.

We have a couple of relevant tests in test_read_handling.py, but these test things at the functional/script level. I suggest we need:

  • [ ] documentation for the broken paired reader
    • its intended purpose
    • specifics of what will and will not work
  • [ ] some clear unit tests that invoke the broken_paired_reader function directly to enforce the advertised behavior

standage avatar Feb 12 '18 20:02 standage

The current docstring actually documents the function pretty well.

    """Read pairs from a stream.

    A generator that yields singletons and pairs from a stream of FASTA/FASTQ
    records (yielded by 'screed_iter').  Yields (n, is_pair, r1, r2) where
    'r2' is None if is_pair is False.

    The input stream can be fully single-ended reads, interleaved paired-end
    reads, or paired-end reads with orphans, a.k.a. "broken paired".

    Usage::

       for n, is_pair, read1, read2 in broken_paired_reader(...):
          ...

    Note that 'n' behaves like enumerate() and starts at 0, but tracks
    the number of records read from the input stream, so is
    incremented by 2 for a pair of reads.

    If 'min_length' is set, all reads under this length are ignored (even
    if they are pairs).

    If 'force_single' is True, all reads are returned as singletons.
    """

I guess we can still consider integrating this into the developer docs.

standage avatar Feb 13 '18 06:02 standage