AstrobioMike.github.io
AstrobioMike.github.io copied to clipboard
decontam bug in subsetting fasta file when there are no contaminants
Hey @AstrobioMike hope you are well. I ran across a situation in your full amplicon example that breaks the tutorial. Not sure if its worth mentioning but in the event that decontam doesn't reveal any contaminant sequences from the negative controls, subsetting the fasta file breaks. Here is an example of what I'm talking about. Sorry I don't have an elegant solution.
> vector_for_decontam <- c(rep(FALSE, 33), rep(TRUE, 3))
> tail(rownames(t(asv_tab)))
[1] "NORMAL-11b" "GNOTO-12b" "GNOTO-13b" "KITNEG-KN1"
[5] "KITNEG-KN2" "KITNEG-KN3"
> contam_df <- isContaminant(t(asv_tab), neg=vector_for_decontam)
> table(contam_df$contaminant) # identified no contaminants
FALSE
585
> unique(contam_df$contaminant)
[1] FALSE
> # getting vector holding the identified contaminant IDs
> contam_asvs <- row.names(contam_df[contam_df$contaminant == TRUE, ])
> contam_asvs
character(0)
> contam_asvs
character(0)
> asv_tax[row.names(asv_tax) %in% contam_asvs, ]
domain phylum class order family genus species
> # making new fasta file
> contam_indices <- which(asv_fasta %in% paste0(">", contam_asvs))
> contam_indices
integer(0)
> dont_want <- sort(c(contam_indices, contam_indices + 1))
> print(dont_want)
numeric(0)
> asv_fasta_no_contam <- asv_fasta[-dont_want]
> asv_fasta_no_contam
character(0) #UH OHHHH
Ah of course! I’ll add in a step to check and notes about skipping the sub-setting stuff to be more clear. Thanks for the note, Michael! Hope all is well in your world too :)