gatk icon indicating copy to clipboard operation
gatk copied to clipboard

VariantRecalibrator R-script fails if `scales` v1.3.0 is installed

Open MikkelSchubert opened this issue 1 year ago • 11 comments

Bug Report

Affected tool(s) or class(es)

VariantRecalibrator

Affected version(s)

  • [X] Latest public release version [4.5.0.0]
  • [ ] Latest master branch as of [date of test?]

Description

As of v1.3.0 the scales R package turns the use of deprecated values for the space parameter into a hard error, resulting in the VariantRecalibrator R-script terminating with the following message:

The space argument of pal_gradient_n() only supports be "Lab" as of scales 0.3.0.

This parameter is used repeatedly in the generated R-script via

scale_fill_gradient(high="green", low="red", space="rgb")

Steps to reproduce

$ R --version
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
$ rm -rf ~/R
$ R
> install.packages("ggplot2", repos="https://cloud.r-project.org/")
> packageVersion("scales")
[1] ‘1.3.0’
> quit()
$ gatk --version
The Genome Analysis Toolkit (GATK) v4.5.0.0
HTSJDK Version: 4.1.0
Picard Version: 3.1.1
$ gatk VariantRecalibrator  [arguments omitted for brevity]
org.broadinstitute.hellbender.utils.R.RScriptExecutorException: 
Rscript exited with 1
Command Line: Rscript -e tempLibDir = '/tmp/Rlib.9339186078473502558';source('/path/to/rscript.r');
Stdout: 
Stderr: Error:
! The `space` argument of `pal_gradient_n()` only supports be "Lab" as
  of scales 0.3.0.
Backtrace:
     ▆
  1. ├─base::source("/path/to/rscript.r")
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │   └─base::eval(ei, envir)
  5. └─ggplot2::scale_fill_gradient(high = "green", low = "red", space = "rgb")
  6.   ├─ggplot2::continuous_scale(...)
  7.   │ └─ggplot2::ggproto(...)
  8.   │   └─rlang::list2(...)
  9.   └─scales::seq_gradient_pal(low, high, space)
 10.     └─scales::pal_gradient_n(c(low, high), space = space)
 11.       └─lifecycle::deprecate_stop("0.3.0", "pal_gradient_n(space = 'only supports be \"Lab\"')")
 12.         └─lifecycle:::deprecate_stop0(msg)
 13.           └─rlang::cnd_signal(...)
Execution halted
$ R
> install.packages("remotes", repos="https://cloud.r-project.org/")
> library(remotes)
> install_version("scales", version="1.2.1", repos="https://cloud.r-project.org/")
> packageVersion("scales")
[1] ‘1.2.1’
> quit()
$ gatk VariantRecalibrator [arguments omitted for brevity]
$

Expected behavior

The output rscript file is used to generate a PDF.

Actual behavior

Generation of the PDF fails due to an deprecation in the scales library causing the Rscript command to abort.

MikkelSchubert avatar Jan 23 '24 11:01 MikkelSchubert

Looks like we need to update our Rscripts... thanks for the report!

lbergelson avatar Feb 08 '24 21:02 lbergelson

Hi, i face the same bug.

Could you tell me which version ggplot2 can be used or how many times you can fix this problem?

tahanks!

wanqiangdehuoguo avatar Mar 30 '24 02:03 wanqiangdehuoguo

R version 3.6 and compatible ggplot2 is needed. Compatible versions are listed in the gatkcondaenv.yml

# core R dependencies; these should only be used for plotting and do not take precedence over core python dependencies!
- r-base=3.6.2
- r-data.table=1.12.8
- r-dplyr=0.8.5
- r-getopt=1.20.3
- r-ggplot2=3.3.0
- r-gplots=3.0.3
- r-gsalib=2.1
- r-optparse=1.6.4
- r-backports=1.1.10

gokalpcelik avatar Mar 31 '24 07:03 gokalpcelik

Hi, I also face this problem:

Runtime.totalMemory()=8598323200`
org.broadinstitute.hellbender.utils.R.RScriptExecutorException: 
Rscript exited with 1
Command Line: Rscript -e tempLibDir = '/tmp/Rlib.3561179774649616878';source('/mnt/filename.snps.plots.R');
Stdout: 
Stderr: Error:
! The `space` argument of `pal_gradient_n()` only supports be "Lab" as
  of scales 0.3.0.
Backtrace:
     ▆
  1. ├─base::source("/mnt/filename.snps.plots.R")
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │   └─base::eval(ei, envir)
  5. └─ggplot2::scale_fill_gradient(high = "green", low = "red", space = "rgb")
  6.   ├─ggplot2::continuous_scale(...)
  7.   │ └─ggplot2::ggproto(...)
  8.   │   └─rlang::list2(...)
  9.   └─scales::pal_seq_gradient(low, high, space)
 10.     └─scales::pal_gradient_n(c(low, high), space = space)
 11.       └─lifecycle::deprecate_stop("0.3.0", "pal_gradient_n(space = 'only supports be \"Lab\"')")
 12.         └─lifecycle:::deprecate_stop0(msg)
 13.           └─rlang::cnd_signal(...)
Execution halted

My versions of R and packages are R = 4.2.3 ggplot2 = 3.5.0

Did you already find a solution to this problem?

Thanks!

Lotteaveline avatar Apr 10 '24 08:04 Lotteaveline

Hi @Lotteaveline Recent versions of R and libraries are known to have issues therefore our suggestion is to stick with the versions recommended in the list above.

gokalpcelik avatar Apr 10 '24 10:04 gokalpcelik

Okay thank you for the quick response!

Lotteaveline avatar Apr 10 '24 11:04 Lotteaveline

Hi, there:

I am using R v4.3.4, scales v1.3.0, ggplot2 v3.4.4.

Can you please kindly let me know how to resolve the issue mentioend above: The space argument of pal_gradient_n() only supports be "Lab" as of scales 0.3.0.

I hope that I don't need to download my R version, that will make a lot of other scripts not work.

Thanks! JH

jielab avatar Jul 05 '24 05:07 jielab

You need to use the versions suggested above. If it is not possible to downgrade your R environment then the only solution would be to use the Conda environment for GATK which installs all the necessary components. Or you may use the docker image we provide.

gokalpcelik avatar Jul 05 '24 05:07 gokalpcelik

Thanks!

GATK has been there fore more than 1 decade, I guess. I really hope that now it is easy to run.

Can you please let me know how to install through conda then?

BTW, the current version 4.5.0 does not require users to separate SNP from INDEL when calling variants, correct?

Best regards, Jie

jielab avatar Jul 05 '24 06:07 jielab

Just follow the recommendations from our readme file


First, make sure [Miniconda or Conda](https://conda.io/docs/index.html) is installed (Miniconda is sufficient).

To "create" the conda environment:
If running from a zip or tar distribution, run the command conda env create -f gatkcondaenv.yml to create the gatk environment.

Execute the shell command source activate gatk to activate the gatk environment.
See the [Conda](https://conda.io/docs/user-guide/tasks/manage-environments.html) documentation for additional information about using and managing Conda environments.

And yes you don't have to call SNPs and INDELs separately.

gokalpcelik avatar Jul 08 '24 11:07 gokalpcelik

Dear Gökalp:

Thank you very much!

You suggested to run conda env create -f gatkcondaenv.yml. Where is the gatkcondaenv.yml file?

If I simply used git clone https://github.com/broadinstitute/gatk.git. The cloned package has a gatk executable. I found that I could run it directly.

If I simply go to https://gatk.broadinstitute.org/hc/en-us homepage, and download the latest version file https://github.com/broadinstitute/gatk/releases/download/4.6.0.0/gatk-4.6.0.0.zip. After unzipping it, there is also a gatk executable, and I could also run it directly (./gatk) on the shell.

So, now I am a bit puzzled: which is the recommended way to install and run GATK?

Finally, it seems that you guys now recommend WARP https://broadinstitute.github.io/warp/, which seems to be a completely new set of tools and pipeline scripts. Is WDL now the recommended approach to run GATK?

Thank you very much & best regards, Jie

jielab avatar Jul 08 '24 21:07 jielab