annotatr
annotatr copied to clipboard
A large number of regions are missing without annotations.
##Question A large number of regions are missing without annotations. I have a bed file with 38685 lines, but only less than 9549 lines have been annotated.
##Step step1:read genome region library(annotatr) file = "38685.bed" test_regions = read_regions(con = file, genome = 'hg38', format = 'bed') test_regions = test_regions[1:2000] print(test_regions) step2:annotate genome region annots = c('hg38_cpgs', 'hg38_basicgenes') annotations = build_annotations(genome = 'hg38', annotations = annots) test_annotated = annotate_regions(regions = test_regions, annotations = annotations, ignore.strand = TRUE, quiet = FALSE) print(test_annotated) df_test_annotated = data.frame(test_annotated) print(head(df_test_annotated)) write.table(df_test_annotated,"df_test_annotated.txt") step3:summary test_annsum = summarize_annotations( annotated_regions = test_annotated, quiet = TRUE) print(test_annsum) step4:visualization plot_annotation(annotated_regions = test_annotated)
Is that the code you're actually using? If it is, when you load the file:
file = "38685.bed"
test_regions = read_regions(con = file, genome = 'hg38', format = 'bed')
test_regions = test_regions[1:2000]
You are keeping only the first 2000 lines (test_regions = test_regions[1:2000]
), so it will only annotate those.
(The number of annotations you are getting is greater than 2000 because a single region can have multiple annotations, since it can overlap several genes, introns and exons in the same gene, genes and cpg islands, etc). I'm not sure where you are getting the ~9000 number, because you are annotating 2000 lines from the bed and seem to be getting ~3000-4000 annotations.
Agree with previous commenter.