gat icon indicating copy to clipboard operation
gat copied to clipboard

Not an issue, but I am confused ...

Open DRL opened this issue 6 years ago • 1 comments

Hi AndreasHeger,

Problem:

  • I want to calculate whether certain annotation features (genes, repeats, etc) are enriched/depleted in a particular subset of contigs in an assembly

--workspace: BED file of all regions in genome (excluding regions composed of N's) --segments: BED file of annotations in subset of contigs

contig_1001    21      792     RepeatMasker
contig_1001    27      34      dust
contig_1001    93      159     dust
contig_1001    246     255     dust
contig_1001    266     339     dust
contig_1001    415     422     dust

--annotation: BED file of annotations across the whole genome (same as above but for whole genome)

The output I get when running:

gat-run.py --ignore-segment-tracks --segments=segments.bed --annotations=annotations.bed --workspace=workspace.bed --num-samples=100 --log=gat.log --num-threads=8 > gat.out

is

track   annotation        observed  expected      CI95low       CI95high      stddev     fold    l2fold  pvalue      qvalue      track_nsegments  track_size  track_density  annotation_nsegments  annotation_size  annotation_density  overlap_nsegments  overlap_size  overlap_density  percent_overlap_nsegments_track  percent_overlap_size_track  percent_overlap_nsegments_annotation  percent_overlap_size_annotation
merged  ncrnas_predicted  2913      1709.1200     1300.0000     1994.0000     209.0009   1.7040  0.7689  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     1025                  163283           1.5754e-01          30                 2913          2.8105e-03       0.0476                           0.0420                      2.9268                                1.7840
merged  gene              389744    170648.2000   163172.0000   177856.0000   5359.9760  2.2839  1.1915  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     18574                 37934616         3.6599e+01          278                389744        3.7603e-01       0.4414                           5.6198                      1.4967                                1.0274
merged  tandem            368130    158513.4400   154952.0000   162625.0000   2399.6840  2.3224  1.2156  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     47134                 4562430          4.4018e+00          4994               368130        3.5517e-01       7.9291                           5.3082                      10.5953                               8.0687
merged  RepeatMasker      1492404   610641.4800   602042.0000   620429.0000   6353.3404  2.4440  1.2892  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     117147                21502336         2.0745e+01          8705               1492404       1.4399e+00       13.8212                          21.5193                     7.4308                                6.9407
merged  dust              3200967   1182955.4000  1172992.0000  1190872.0000  4343.2429  2.7059  1.4361  1.0000e-02  1.0000e-02  62983            6935174     6.6911e+00     382880                14706492         1.4189e+01          63463              3200967       3.0883e+00       100.7621                         46.1555                     16.5752                               21.7657

I am confused:

  • shouldn't percent_overlap_size_track and co be 100% for all?

Thank you in advance.

cheers,

dom

DRL avatar Aug 01 '17 16:08 DRL