cfDNAPro icon indicating copy to clipboard operation
cfDNAPro copied to clipboard

Excluding chromosomes from analysis

Open hwm08 opened this issue 1 year ago • 1 comments

I am trying to run cfNDAPro on data aligned to a "synthetic" genome containing puc and lambda sequences to check for methylation conversion. I am trying to only look at reads aligned to 1:22 and X and Y however I struggle to read the bam file into cfDNAPro:

read_bam_insert_metrics(bamfile = file.bam, genome_label="hg38-NCBI",chromosome_to_keep =append(1:22,c("X","Y")))

bamfile was supplied. Reading bam into galp... Curating seqnames and strand information... Removing outward facing fragments ... Correcting start and end coordinates of fragments ... Error in .normarg_seqlengths(value, seqnames(x)) : the length of the supplied 'seqlengths' vector must be equal to the number of sequences Calls: read_bam_insert_metrics ... seqlengths<- -> seqlengths<- -> .normarg_seqlengths In addition: Warning message: In .merge_two_Seqinfo_objects(x, y) : Each of the 2 combined objects has sequence levels not in the other:

  • in 'x': KI270728.1, KI270727.1, KI270442.1, KI270729.1, GL000225.1, KI270743.1, GL000008.2, GL000009.2, KI270747.1, KI270722.1, GL000194.1, KI270742.1, GL000205.2, GL000195.1, KI270736.1, KI270733.1, GL000224.1, GL000219.1, KI270719.1, GL000216.2, KI270712.1, KI270706.1, KI270725.1, KI270744.1, KI270734.1, GL000213.1, GL000220.1, KI270715.1, GL000218.1, KI270749.1, KI270741.1, GL000221.1, KI270716.1, KI270731.1, KI270751.1, KI270750.1, KI270519.1, GL000214.1, KI270708.1, KI270730.1, KI270438.1, KI270737.1, KI270721.1, KI270738.1, KI270748.1, KI270435.1, GL000208.1, KI270538.1, KI270756.1, KI270739.1, KI270757.1, KI270709.1, KI270746.1, KI270753.1, KI270589.1, KI270726.1, KI270735.1, KI270711.1, KI270745.1, KI270714.1, KI270732.1, KI270713.1, KI270754.1, KI270710.1, KI270717.1, KI270724.1, KI270720.1, KI270723.1, KI270718.1, KI270317.1, KI270740.1, KI270755.1, KI270707.1, KI270579.1, KI270752.1, KI270512.1, KI27032 [... truncated] Execution halted

I get the same error when I subset the bam file to only the relevant chromosomes using samtools view then re-index. I believe this is because the header retains the old chromosome names:

… 9 138394717 1374698 0 MT 16569 0 0 X 156040895 1739164 0 Y 57227415 7892 0 KI270728.1 1872759 0 0 KI270727.1 448248 0 0 KI270442.1 392061 0 0 …

Is there a way around this? many thanks!

hwm08 avatar Jun 14 '23 10:06 hwm08