fgpyo
fgpyo copied to clipboard
feat: Add methods for UMI extraction from read name
Addresses https://github.com/fulcrumgenomics/fgpyo/issues/92
-
Added extract_umis_from_read_name() (based on gist provided by Matt); modified to accept a bool
strict
: -- ifstrict=True
, we count number of colons in read name to determine whether to return None or the last element in the read name, which is assumed to be UMI (should be consistent withfgbio
functionality -
Added copy_umi_from_read_name() with bool
remove_umi
: -- Copies the UMI to the record'sRX tag
-- Ifremove_umi = True
, then we update therec.qname
in place to include everything but the UMI -
Added testing coverage in test_umi_methods.py -- test_strict_extract_umi_from_read_name(): if
strict=True
, colon count of read_name becomes relevant -- test_strict_extract_umi_from_read_name_raises(): test that whenstrict=True
aValueError()
is raised when number of colons is not 7 or 8 as expected -- test_copy_umi_from_read_name() with and withoutremove_umi=True