anvio icon indicating copy to clipboard operation
anvio copied to clipboard

[BUG] anvi-script-visualize-split-coverages gives error

Open efogarty11 opened this issue 2 years ago • 6 comments

Short description of the problem

anvi-script-visualize-split-coverages gives this error:

Error in contig_offsets[as.character(split_name), as.character(sample_name)] :
  subscript out of bounds
Calls: parse_input_data ... get_offset_snv_positions -> unlist -> Map -> mapply -> <Anonymous>
Execution halted

anvi'o version

anvio main/master

Detailed description of the issue

Both of these commands (one uses the sample_data_colors.txt file and one does not) give the same error.

anvi-script-visualize-split-coverages -i split_cov_USA0006.txt -o USA0006_INSPECT.pdf --sample-data sample_data_colors.txt --snv-data USA0006_SNVs.txt --free-y-scale TRUE --max-coverage 80000 --snv-marker-transparency 0.9 --snv-marker-width 0.1
I, [2022-06-09 13:46:14.152963 #12996] INFO -- Checking options
I, [2022-06-09 13:46:14.181274 #12996] INFO -- Parsing input data
I, [2022-06-09 13:46:14.297376 #12996] INFO -- Largest sample has 2751 points
I, [2022-06-09 13:46:14.297517 #12996] INFO -- Compressing samples.  Window size is 9
I, [2022-06-09 13:46:14.297627 #12996] INFO -- Total data points before compression: 2751
I, [2022-06-09 13:46:14.315867 #12996] INFO -- Total data points after compression: 307
I, [2022-06-09 13:46:14.316031 #12996] INFO -- Coverage will be capped at 80000
W, [2022-06-09 13:46:14.321031 #12996] WARN -- A single group was provided for all samples in sample_data file.  This option is for splitting samples across PDFs.  Did you mean to specify more than one group?
I, [2022-06-09 13:46:14.332362 #12996] INFO -- Sample_data given.  Adjusting split_coverages.
Error in contig_offsets[as.character(split_name), as.character(sample_name)] :
  subscript out of bounds
Calls: parse_input_data ... get_offset_snv_positions -> unlist -> Map -> mapply -> <Anonymous>
Execution halted
anvi-script-visualize-split-coverages -i split_cov_USA0006.txt -o USA0006_INSPECT.pdf --snv-data USA0006_SNVs.txt -v 3 --free-y-scale TRUE -m 100000 --coverage-plot-color="#2d5195" --snv-wobble-pos-color='#327018' --snv-non-wobble-pos-color='#b82318' --snv-intergenic-color='#000000' --snv-marker-width=0.1
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts infile: split_cov_USA0006.txt
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts snv_data: USA0006_SNVs.txt
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts outfile_basename: USA0006_INSPECT.pdf
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts chart_type: area
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts whole_chart_width: 10
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts individual_chart_height: 2
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts free_y_scale: TRUE
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts coverage_plot_color: #2d5195
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts snv_wobble_pos_color: #327018
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts snv_non_wobble_pos_color: #b82318
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts snv_intergenic_color: #000000
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts snv_marker_width: 0.1
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts snv_marker_transparency: 1
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts max_coverage: 1e+05
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts compress_threshold: 1000
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts window_size: 0
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts no_compression: FALSE
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts verbose: 3
D, [2022-06-09 13:47:04.434685 #33061] DEBUG -- Opts help: FALSE
I, [2022-06-09 13:47:04.435977 #33061] INFO -- Checking options
I, [2022-06-09 13:47:04.477953 #33061] INFO -- Parsing input data
D, [2022-06-09 13:47:04.526025 #33061] DEBUG -- Parsing split coverages
D, [2022-06-09 13:47:04.597400 #33061] DEBUG -- Reading split_coverages
D, [2022-06-09 13:47:04.607948 #33061] DEBUG -- Checking split_coverages
D, [2022-06-09 13:47:04.608123 #33061] DEBUG -- Ordering split_coverages
D, [2022-06-09 13:47:04.609300 #33061] DEBUG -- Checking for multiple split_names
D, [2022-06-09 13:47:04.609541 #33061] DEBUG -- Unique split names in split coverages file: 1
D, [2022-06-09 13:47:04.609664 #33061] DEBUG -- Single split name found
D, [2022-06-09 13:47:04.609807 #33061] DEBUG -- Getting contig offsets
D, [2022-06-09 13:47:04.635038 #33061] DEBUG -- Getting lengths per sample
D, [2022-06-09 13:47:04.636660 #33061] DEBUG -- Getting number of samples
D, [2022-06-09 13:47:04.636811 #33061] DEBUG -- Getting contig offsets
D, [2022-06-09 13:47:04.637111 #33061] DEBUG -- Ensuring data is a matrix
D, [2022-06-09 13:47:04.637290 #33061] DEBUG -- Compression is on
D, [2022-06-09 13:47:04.645469 #33061] DEBUG -- Automatically setting the window size.
I, [2022-06-09 13:47:04.645645 #33061] INFO -- Largest sample has 2751 points
I, [2022-06-09 13:47:04.645797 #33061] INFO -- Compressing samples.  Window size is 9
I, [2022-06-09 13:47:04.645912 #33061] INFO -- Total data points before compression: 2751
I, [2022-06-09 13:47:04.676044 #33061] INFO -- Total data points after compression: 307
I, [2022-06-09 13:47:04.676264 #33061] INFO -- Coverage will be capped at 100000
D, [2022-06-09 13:47:04.676481 #33061] DEBUG -- No. unique samples in split names file: 1
D, [2022-06-09 13:47:04.676663 #33061] DEBUG -- Parsing SNV data
Error in contig_offsets[as.character(split_name), as.character(sample_name)] :
  subscript out of bounds
Calls: parse_input_data ... get_offset_snv_positions -> unlist -> Map -> mapply -> <Anonymous>
Execution halted

Files to reproduce

split_cov_USA0006.txt USA0006_000000000001_split_00001.txt USA0006_SNVs.txt sample_data_colors.txt

Note: I tried rerunning anvi-script-visualize-split-coverages command on samples I'd used it on before and I also got the same error.

@mooreryan it won't let me add you as an assignee, but wanted you to be aware of this too :)

efogarty11 avatar Jun 09 '22 18:06 efogarty11

If I remove --snv-data USA0006_SNVs.txt the script works:

anvi-script-visualize-split-coverages -i split_cov_USA0006.txt -o USA0006_INSPECT.pdf --sample-data sample_data_colors.txt --free-y-scale TRUE --max-coverage 80000 --snv-marker-transparency 0.9 --snv-marker-width 0.1

image

@efogarty11 does this plot ^ look correct?

The code breaks here. With Emily's example above the variable contig_offsets is a double with dimension names. I didn't even know a double could have dimensions? I think the code is expecting a matrix at this point so when it receives a double it breaks because it can't subset it.

@efogarty11 can you send me another example where you have multiple split coverages? In the example above I only see data for USA0006_000000000001_split_00001. Thanks!

mschecht avatar Aug 05 '22 16:08 mschecht

Sorry for delayed response!

@efogarty11 By any chance, when things were working were you using R version 3, and when things were breaking were you using R version 4?

@mschecht contig_offsets will be a matrix (unless there is a horrible bug somewhere, which could be!) This block always forces contig_offsets to be a matrix.

Regarding a double having dimnames, are you using typeof to determine that? typeof(some_matrix) will give double if the types contained in the matrix are doubles. E.g.,

> m <- matrix(0.0, 2, 2)
> dimnames(m) <- list(contig_name = c("a", "b"), sample_name = c("s1", "s2"))
> m
           sample_name
contig_name s1 s2
          a  0  0
          b  0  0
> typeof(m)
[1] "double"
> class(m)
[1] "matrix" "array" 

mooreryan avatar Aug 05 '22 16:08 mooreryan

Hey @mooreryan, the anvio env now install R version 4 something by default since we use conda install -y -c bioconda r-base.

Regarding a double having dimnames, are you using typeof to determine that? typeof(some_matrix) will give double if the types contained in the matrix are doubles.

Thanks for clarifying this, I was SO confused why typeof(contig_offsets) wasn't printing a matrix!

Ok then the issue is with manipulating the contig_offsets matrix here. I think the col and rownames are not being assigned properly here which might be the root of the problem. I'll keep at it!

mschecht avatar Aug 05 '22 17:08 mschecht

I'm sorry I don't have time today to run the test data and check it, but I would guess the problem starts here in the code that reads the table.

Back when I wrote this in 2019, R v4 was not out yet. One big change in R v4 was ⁠stringsAsFactors = FALSE⁠ rather than ⁠stringsAsFactors = TRUE in read.table and elsewhere. Later code assumes that strings will be treated as factors not characters...check it out below...

In that code you linked where it is setting the dimnames:

  dimnames(contig_offsets) <- list(
    split_name = levels(split_coverages$split_name),
    sample_name = levels(split_coverages$sample_name)
  )

I'm using levels. But that really only works right with factors. It gives NULL if data are characters. E.g.,

> letters[1:5]
[1] "a" "b" "c" "d" "e"
> levels(letters[1:5])
NULL
> levels(as.factor(letters[1:5]))
[1] "a" "b" "c" "d" "e"

Since in R v4, read.table gives characters by default rather than factors, the levels function is setting the dimnames to NULL. (So what that is doing is setting the rownames and colnames of contig_offsets both to NULL.) Basically something like this:

> m <- matrix(0, 2,2)
> dimnames(m) <- list(a = NULL, b = NULL)
> m
      b
a      [,1] [,2]
  [1,]    0    0
  [2,]    0    0
> m["something", "else"]
Error in m["something", "else"] : subscript out of bounds
> dimnames(m)
$a
NULL

$b
NULL

Then in that other line you linked,

        offset <- contig_offsets[as.character(split_name), as.character(sample_name)]

You will get an error about subscripts out of bounds as above.


Of course, I haven't tested it so it may not be the real problem, but that is my guess. You could test it out by adding stringsAsFactors = TRUE to the read.table function here.

If you try it out and it seems to work, there are tests for the script here, but I will admit they are finicky to run...

mooreryan avatar Aug 05 '22 17:08 mooreryan

Thanks @mooreryan for the detailed explanation! I've never debugged R like this so this was a great learning experience.

I went ahead an implemented the⁠stringsAsFactors = TRUE in this branch: split_cov_R_v4

Here is the output for the original test data in the issue:

anvi-script-visualize-split-coverages -i split_cov_USA0006.txt \
                                      -o USA0006_INSPECT.pdf \
                                      --sample-data sample_data_colors.txt \
                                      --free-y-scale TRUE \
                                      --max-coverage 80000 \
                                      --snv-marker-transparency 0.9 \
                                      --snv-marker-width 0.1 \
                                      --snv-data USA0006_SNVs.txt

image

and here is the output from the new test data @efogarty11 kindly provided yesterday:

anvi-script-visualize-split-coverages -i matt_coverage.txt \
                                      -o USA0006_INSPECT.pdf \
                                      --sample-data sample_data_colors.txt \
                                      --free-y-scale TRUE \
                                      --max-coverage 80000 \
                                      --snv-marker-transparency 0.9 \
                                      --snv-marker-width 0.1 \
                                      --snv-data matt_var.txt

image

@efogarty11 please check two plots above to see if this is the expected output.

@mooreryan I wasn't able to run your test scripts run_tests.R or test_visualize_split_coverages.R. However, your Makefile successfully ran:

$ cd ~/github/anvio/anvio/tests/sandbox/test_visualize_split_coverages
$ make
rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output && ls -lah TEST_OUTDIR && echo
rm: TEST_OUTDIR: No such file or directory
I, [2022-08-06 14:48:58.454574 #64223] INFO -- Checking options
I, [2022-08-06 14:48:58.471240 #64223] INFO -- Parsing input data
I, [2022-08-06 14:48:58.533647 #64223] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:48:58.543195 #64223] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:48:58.549557 #64223] INFO -- Making coverage plots
I, [2022-08-06 14:48:58.608108 #64223] INFO -- Writing coverage plots
I, [2022-08-06 14:48:58.898771 #64223] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:48 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:48 ..
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:48 vis_split_cov_output___First_group.pdf
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:48 vis_split_cov_output___Second_group.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output --chart-type line && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:48:59.852610 #64240] INFO -- Checking options
I, [2022-08-06 14:48:59.869751 #64240] INFO -- Parsing input data
I, [2022-08-06 14:48:59.932739 #64240] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:48:59.941924 #64240] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:48:59.949291 #64240] INFO -- Making coverage plots
Warning messages:
1: Ignoring unknown parameters: fill
2: Ignoring unknown parameters: fill
I, [2022-08-06 14:49:00.013504 #64240] INFO -- Writing coverage plots
I, [2022-08-06 14:49:00.254442 #64240] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:48 ..
-rw-r--r--  1 mschechter  staff   4.7K Aug  6 14:49 vis_split_cov_output___First_group.pdf
-rw-r--r--  1 mschechter  staff   4.7K Aug  6 14:49 vis_split_cov_output___Second_group.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data_no_color.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output && ls -lah TEST_OUTDIR  && echo
I, [2022-08-06 14:49:01.145577 #64263] INFO -- Checking options
I, [2022-08-06 14:49:01.162549 #64263] INFO -- Parsing input data
I, [2022-08-06 14:49:01.223912 #64263] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:01.233871 #64263] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:01.240401 #64263] INFO -- Making coverage plots
I, [2022-08-06 14:49:01.298367 #64263] INFO -- Writing coverage plots
I, [2022-08-06 14:49:01.579375 #64263] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:49 vis_split_cov_output___1.pdf
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:49 vis_split_cov_output___2.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data_no_color.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output --chart-type area --xlim 10,20 && ls -lah TEST_OUTDIR  && echo
I, [2022-08-06 14:49:02.517004 #64279] INFO -- Checking options
I, [2022-08-06 14:49:02.532636 #64279] INFO -- Parsing input data
I, [2022-08-06 14:49:02.593744 #64279] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:02.602638 #64279] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:02.609424 #64279] INFO -- Making coverage plots
I, [2022-08-06 14:49:02.666842 #64279] INFO -- Writing coverage plots
I, [2022-08-06 14:49:02.942132 #64279] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   4.9K Aug  6 14:49 vis_split_cov_output___1.pdf
-rw-r--r--  1 mschechter  staff   4.9K Aug  6 14:49 vis_split_cov_output___2.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data_no_group.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output && ls -lah TEST_OUTDIR  && echo
I, [2022-08-06 14:49:03.802238 #64294] INFO -- Checking options
I, [2022-08-06 14:49:03.818145 #64294] INFO -- Parsing input data
I, [2022-08-06 14:49:03.878494 #64294] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:03.888094 #64294] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:03.894823 #64294] INFO -- Making coverage plots
I, [2022-08-06 14:49:03.942473 #64294] INFO -- Writing coverage plots
I, [2022-08-06 14:49:04.167449 #64294] INFO -- Done!
total 16
drwxr-xr-x  3 mschechter  staff    96B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   5.3K Aug  6 14:49 vis_split_cov_output.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:05.032426 #64309] INFO -- Checking options
I, [2022-08-06 14:49:05.047938 #64309] INFO -- Parsing input data
I, [2022-08-06 14:49:05.107300 #64309] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:05.114521 #64309] INFO -- Making coverage plots
I, [2022-08-06 14:49:05.161279 #64309] INFO -- Writing coverage plots
I, [2022-08-06 14:49:05.386936 #64309] INFO -- Done!
total 16
drwxr-xr-x  3 mschechter  staff    96B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   5.3K Aug  6 14:49 vis_split_cov_output.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output --chart-type area --xlim 10,20 && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:06.283622 #64325] INFO -- Checking options
I, [2022-08-06 14:49:06.300132 #64325] INFO -- Parsing input data
I, [2022-08-06 14:49:06.360490 #64325] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:06.368870 #64325] INFO -- Making coverage plots
I, [2022-08-06 14:49:06.417193 #64325] INFO -- Writing coverage plots
I, [2022-08-06 14:49:06.647416 #64325] INFO -- Done!
total 16
drwxr-xr-x  3 mschechter  staff    96B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   5.4K Aug  6 14:49 vis_split_cov_output.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:07.517177 #64340] INFO -- Checking options
I, [2022-08-06 14:49:07.533052 #64340] INFO -- Parsing input data
I, [2022-08-06 14:49:07.594352 #64340] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:07.595331 #64340] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:07.597552 #64340] INFO -- Making coverage plots
I, [2022-08-06 14:49:07.646965 #64340] INFO -- Writing coverage plots
I, [2022-08-06 14:49:07.891998 #64340] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:49 vis_split_cov_output___First_group.pdf
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:49 vis_split_cov_output___Second_group.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:08.752243 #64355] INFO -- Checking options
I, [2022-08-06 14:49:08.768742 #64355] INFO -- Parsing input data
I, [2022-08-06 14:49:08.830842 #64355] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:08.831154 #64355] INFO -- Making coverage plots
I, [2022-08-06 14:49:08.878134 #64355] INFO -- Writing coverage plots
I, [2022-08-06 14:49:09.080516 #64355] INFO -- Done!
total 16
drwxr-xr-x  3 mschechter  staff    96B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   5.2K Aug  6 14:49 vis_split_cov_output.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output.pdf && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:09.967879 #64370] INFO -- Checking options
I, [2022-08-06 14:49:09.984385 #64370] INFO -- Parsing input data
I, [2022-08-06 14:49:10.045829 #64370] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:10.055047 #64370] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:10.061853 #64370] INFO -- Making coverage plots
I, [2022-08-06 14:49:10.119445 #64370] INFO -- Writing coverage plots
I, [2022-08-06 14:49:10.402134 #64370] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:49 vis_split_cov_output___First_group.pdf
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:49 vis_split_cov_output___Second_group.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data_no_group.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output.pdf && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:11.279244 #64385] INFO -- Checking options
I, [2022-08-06 14:49:11.296263 #64385] INFO -- Parsing input data
I, [2022-08-06 14:49:11.357532 #64385] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:11.366276 #64385] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:11.372991 #64385] INFO -- Making coverage plots
I, [2022-08-06 14:49:11.420550 #64385] INFO -- Writing coverage plots
I, [2022-08-06 14:49:11.644249 #64385] INFO -- Done!
total 16
drwxr-xr-x  3 mschechter  staff    96B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   5.3K Aug  6 14:49 vis_split_cov_output.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output --xlim 10,20 && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:12.529985 #64400] INFO -- Checking options
I, [2022-08-06 14:49:12.547000 #64400] INFO -- Parsing input data
I, [2022-08-06 14:49:12.608119 #64400] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:12.616842 #64400] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:12.623074 #64400] INFO -- Making coverage plots
I, [2022-08-06 14:49:12.681130 #64400] INFO -- Writing coverage plots
I, [2022-08-06 14:49:12.950172 #64400] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   4.9K Aug  6 14:49 vis_split_cov_output___First_group.pdf
-rw-r--r--  1 mschechter  staff   4.9K Aug  6 14:49 vis_split_cov_output___Second_group.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output --xlim 10,20 --chart-type area && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:13.819144 #64415] INFO -- Checking options
I, [2022-08-06 14:49:13.836278 #64415] INFO -- Parsing input data
I, [2022-08-06 14:49:13.897905 #64415] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:13.907347 #64415] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:13.914092 #64415] INFO -- Making coverage plots
I, [2022-08-06 14:49:13.971541 #64415] INFO -- Writing coverage plots
I, [2022-08-06 14:49:14.242892 #64415] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   4.9K Aug  6 14:49 vis_split_cov_output___First_group.pdf
-rw-r--r--  1 mschechter  staff   4.9K Aug  6 14:49 vis_split_cov_output___Second_group.pdf

rm -r TEST_OUTDIR; mkdir -p TEST_OUTDIR; ../../../../sandbox/anvi-script-visualize-split-coverages --infile test_files/split_cov.txt --sample-data test_files/sample_data.txt --snv-data test_files/snv.txt --outfile-basename TEST_OUTDIR/vis_split_cov_output --xlim 10,20 --chart-type line && ls -lah TEST_OUTDIR && echo
I, [2022-08-06 14:49:15.127487 #64431] INFO -- Checking options
I, [2022-08-06 14:49:15.144175 #64431] INFO -- Parsing input data
I, [2022-08-06 14:49:15.205775 #64431] INFO -- Coverage will be capped at 1000
I, [2022-08-06 14:49:15.214354 #64431] INFO -- Sample_data given.  Adjusting split_coverages.
I, [2022-08-06 14:49:15.220570 #64431] INFO -- Making coverage plots
Warning messages:
1: Ignoring unknown parameters: fill
2: Ignoring unknown parameters: fill
I, [2022-08-06 14:49:15.283823 #64431] INFO -- Writing coverage plots
I, [2022-08-06 14:49:15.513770 #64431] INFO -- Done!
total 32
drwxr-xr-x  4 mschechter  staff   128B Aug  6 14:49 .
drwxr-xr-x  9 mschechter  staff   288B Aug  6 14:49 ..
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:49 vis_split_cov_output___First_group.pdf
-rw-r--r--  1 mschechter  staff   4.8K Aug  6 14:49 vis_split_cov_output___Second_group.pdf

I think this branch will solve this issue, let me know what you all think :)

mschecht avatar Aug 06 '22 20:08 mschecht

@mschecht apologies on the delay getting back to you! That looks great. For comparison, here are the coverage plots for the same two samples that I generated ~3 years ago.

image

They look identical to me! (aside from intergentic SNV color but it's totally fine the way it is)

Thanks so much for looking into this!!

efogarty11 avatar Aug 24 '22 15:08 efogarty11

Great, looks like the different version of R was the culprit! I just merged this fix here: https://github.com/merenlab/anvio/pull/1973

mschecht avatar Aug 25 '22 20:08 mschecht

Y'all are amazinggggg!! Works like a charm now 😄

efogarty11 avatar Aug 25 '22 20:08 efogarty11

@mschecht If you want to open an issue about the tests, I can take a look at it. Though I likely won't be able to get to it for a while, it would be nice if others can run the tests!

mooreryan avatar Aug 26 '22 16:08 mooreryan