fgpyo
fgpyo copied to clipboard
Consider more advanced views into the alignments in Template
After fixing an issue in Template where records were added twice to different internal fields in this PR:
- https://github.com/fulcrumgenomics/fgpyo/pull/203
We might want to consider more specific "views" into the different categories of alignments for multi-mapping chimeric use cases.
Some ideas for useful ways to view the alignments for either the R1 or R2 ordinal:
- Only primary (not secondary, not supplemental)
- Only not primary (all secondary and supplementary)
- Only secondary including "secondary supplementals"
- Only secondary not including "secondary supplementals"
- Only supplementary including "secondary supplementals"
- Only supplementary not including "secondary supplementals"
One mockup for a new class:
@dataclass(frozen=True)
class Template(Iterable):
r1: AlignedSegment | None = None
r2: AlignedSegment | None = None
r1_auxiliaries: list[AlignedSegment] = field(default_factory=list)
r2_auxiliaries: list[AlignedSegment] = field(default_factory=list)
def r1_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r1() if rec.is_supplementary and not rec.is_secondary)
def r2_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r2() if rec.is_supplementary and not rec.is_secondasry)
def r1_secondaries(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r1() if rec.is_secondary and not rec.is_supplementary)
def r2_secondaries(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r2() if rec.is_secondary and not rec.is_supplementary)
def r1_secondary_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r1() if rec.is_secondary and rec.is_supplementary)
def r2_secondary_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r2() if rec.is_secondary and rec.is_supplementary)
def r1_and_supplementals(self) -> Iterator[AlignedSegment]:
yield from ([] if self.r1 is None else [self.r1])
yield from self.r1_supplementals()
def r2_and_supplementals(self) -> Iterator[AlignedSegment]:
yield from ([] if self.r2 is None else [self.r2])
yield from self.r2_supplementals()
def r1_secondary_and_supplementals(self) -> Iterator[AlignedSegment]:
yield from self.r1_secondaries()
yield from self.r1_secondary_supplementals()
def r2_secondary_and_supplementals(self) -> Iterator[AlignedSegment]:
yield from self.r2_secondaries()
yield from self.r2_secondary_supplementals()
def all_r1(self) -> Iterator[AlignedSegment]:
yield from ([] if self.r1 is None else [self.r1])
yield from self.r1_auxiliaries
def all_r2(self) -> Iterator[AlignedSegment]:
yield from ([] if self.r2 is None else [self.r2])
yield from self.r2_auxiliaries
def __iter__(self) -> Iterator[AlignedSegment]:
yield from self.all_r1()
yield from self.all_r2()
If the namespace is too cluttered we could shim a TemplateView class under a Template.view cached property that has a reference back to the template and provides different iterators upon the dataclass data. For example:
template.r1
template.r2
template.r1_auxiliaries
template.r2_auxiliaries
template.view.all_r1()
template.view.all_r2()
template.view.r1_secondaries()
template.view.r2_secondaries()
...
Additionally, I think we should make Template an iterable so you can do list(template) and for rec in template: ... and get back all of the alignments. Previously you had to do all_recs().
+1 for making Template iterable over the constituent alignments.
I would suggest defining each iterator as a property instead of a method. (I find it more pythonic, though that may be subjective.) Similarly, I would vote against an intermediate view attribute, and in favor of the outlined API.
I'm cautious that the proposed behavior could lead to some unexpected footguns.
Specifically, having r(1|2)_supplementals and r(1|2)_secondaries not return all alignments with the corresponding flag might be surprising to users.
I understand we're motivated by allowing users to collect all records associated with, say, a primary chimeric alignment. Perhaps we could consider more precise naming or additional iterator views?
e.g.
r1_supplementals-> All alignments marked supplementaryr1_primary_supplementals-> All alignments marked supplementary and not secondaryr1_secondary_supplementals-> All alignments marked supplementary and secondary
@msto here's a second draft with some your feedback incorporated.
Some of the method names are a lot longer but are also now much less implicit:
@dataclass(frozen=True)
class Template(Iterable):
r1: AlignedSegment | None = None
r2: AlignedSegment | None = None
r1_auxiliaries: list[AlignedSegment] = field(default_factory=list)
r2_auxiliaries: list[AlignedSegment] = field(default_factory=list)
def r1_all_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r1() if rec.is_supplementary)
def r2_all_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r2() if rec.is_supplementary)
def r1_primary_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r1() if rec.is_supplementary and not rec.is_secondary)
def r2_primary_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r2() if rec.is_supplementary and not rec.is_secondary)
def r1_secondary_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r1() if rec.is_supplementary and rec.is_secondary)
def r2_secondary_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r2() if rec.is_supplementary and rec.is_secondary)
def r1_primary_with_supplementals(self) -> Iterator[AlignedSegment]:
yield from ([] if self.r1 is None else [self.r1])
yield from self.r1_primary_supplementals()
def r2_primary_with_supplementals(self) -> Iterator[AlignedSegment]:
yield from ([] if self.r2 is None else [self.r2])
yield from self.r2_primary_supplementals()
def r1_secondary_with_supplementals(self) -> Iterator[AlignedSegment]:
yield from self.r1_secondaries_wihout_supplementals()
yield from self.r1_secondary_supplementals()
def r2_secondary_with_supplementals(self) -> Iterator[AlignedSegment]:
yield from self.r2_secondaries_without_supplementals()
yield from self.r2_secondary_supplementals()
def r1_secondaries_wihout_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r1() if rec.is_secondary and not rec.is_supplementary)
def r2_secondaries_without_supplementals(self) -> Iterator[AlignedSegment]:
yield from (rec for rec in self.all_r2() if rec.is_secondary and not rec.is_supplementary)
def all_r1(self) -> Iterator[AlignedSegment]:
yield from ([] if self.r1 is None else [self.r1])
yield from self.r1_auxiliaries
def all_r2(self) -> Iterator[AlignedSegment]:
yield from ([] if self.r2 is None else [self.r2])
yield from self.r2_auxiliaries
def __iter__(self) -> Iterator[AlignedSegment]:
yield from self.all_r1()
yield from self.all_r2()