fgpyo icon indicating copy to clipboard operation
fgpyo copied to clipboard

feat: Add methods for UMI extraction from read name

Open emmcauley opened this issue 9 months ago • 0 comments

Addresses https://github.com/fulcrumgenomics/fgpyo/issues/92

  • Added extract_umis_from_read_name() (based on gist provided by Matt); modified to accept a bool strict: -- if strict=True, we count number of colons in read name to determine whether to return None or the last element in the read name, which is assumed to be UMI (should be consistent with fgbio functionality

  • Added copy_umi_from_read_name() with bool remove_umi: -- Copies the UMI to the record's RX tag -- If remove_umi = True, then we update the rec.qname in place to include everything but the UMI

  • Added testing coverage in test_umi_methods.py -- test_strict_extract_umi_from_read_name(): if strict=True, colon count of read_name becomes relevant -- test_strict_extract_umi_from_read_name_raises(): test that when strict=True a ValueError() is raised when number of colons is not 7 or 8 as expected -- test_copy_umi_from_read_name() with and without remove_umi=True

emmcauley avatar May 20 '24 15:05 emmcauley