PopSV icon indicating copy to clipboard operation
PopSV copied to clipboard

Error in autoGCcounts

Open smacmil opened this issue 6 years ago • 1 comments

Hi! I've been trying to run this on our HPC and I get the following error. I would greatly appreciate your help in resolving this. I've copied the outputs containing error msgs as below. Thanks!

> library(PopSV)
> source("automatedPipeline-batchtools.R")
Functions :
- 'autoGCcounts' to count BC in each sample.
- 'autoNormTest' to normalize and test all the samples.
- 'autoExtra' for some other functions.

> bam.files = read.table("bams.tsv", as.is=TRUE, header=TRUE)
> files.df = init.filenames(bam.files, code="example")
> save(files.df, file="files.RData")
> bin.size = 1e3
> bins.df = fragment.genome.hg19(bin.size)
> save(bins.df, file="bins.RData")
> res.GCcounts = autoGCcounts("files.RData", "bins.RData")

== 1) Get GC content in each bin.

Reading registry in read-write mode
Sourcing configuration file '/scale_wlg_persistent/pan_migration/uoo00082/PopWork/batchtools.conf.R' ...

== 2) Get bin counts in each sample and correct for GC bias.

Reading registry in read-write mode
Sourcing configuration file '/scale_wlg_persistent/pan_migration/uoo00082/PopWork/batchtools.conf.R' ...
Status for 6 jobs at 2018-09-25 10:04:49:
  Submitted    : 6 (100.0%)
  -- Queued    : 0 (  0.0%)
  -- Started   : 0 (  0.0%)
  ---- Running : 0 (  0.0%)
  ---- Done    : 0 (  0.0%)
  ---- Error   : 0 (  0.0%)
  ---- Expired : 6 (100.0%)
Mean run time: NaN hours.
Re-submitting 1 2 3 4 5 6
Submitting 6 jobs in 6 chunks using cluster functions 'Slurm' ...
                                                                              

Waiting (Q:0 R:6 D:0 E:0 ?:0) [--------------------------------]   0% eta:  ?s

Error in autoGCcounts("files.RData", "bins.RData") :                          
  Not done yet or failed, see for yourself
> 
> 

The .err output file looks like this

### [bt]: Setting seed to 2 ...
Warning: replacing previous import ‘GenomicRanges::shift’ by ‘data.table::shift’ when loading ‘PopSV’
Error in cut.default(gc.df$GCcontent, breaks = seq(0, 1, 0.02), include.lowest = TRUE) : 
  'x' must be numeric

The .out file is

### [bt]: This is batchtools v0.9.11
### [bt]: Starting calculation of 1 jobs
### [bt]: Setting working directory to '/scale_wlg_persistent/xxxx/xxx/xxxx'
### [bt]: Memory measurement disabled
### [bt]: Starting job [batchtools job.id=1]

### [bt]: Job terminated with an exception [batchtools job.id=1]
### [bt]: Calculation finished!

smacmil avatar Sep 25 '18 20:09 smacmil

Hi! I think I may know what's going on. The error message is not explicit and I didnt't emphasis the doc enough but I think that's because the 'bins.RData' was updated with the GC content info but then overwritten back to its original version (without GC content) when rerunning all the commands later.

You could check the bins data.frame to see if that's indeed the problem:

load('bins.RData')
head(bins.df)

What I should have emphasized in the doc: the following commands are just to setup the analysis so they can/should be run only once:

bam.files = read.table("bams.tsv", as.is=TRUE, header=TRUE)
files.df = init.filenames(bam.files, code="example")
save(files.df, file="files.RData")
bin.size = 1e3
bins.df = fragment.genome.hg19(bin.size)
save(bins.df, file="bins.RData")

Then to launch or continue the pipeline, you can open R and just run:

library(PopSV)
source("automatedPipeline-batchtools.R")
res.GCcounts = autoGCcounts("files.RData", "bins.RData")

In your case and if that was indeed the problem, you could force the pipeline to redo the first step of autoGCcounts, that recalculates the GC content:

res.GCcounts = autoGCcounts("files.RData", "bins.RData", redo=1)

Once it's redone, don't forget to remove the redo=1 otherwise this step will be re launched each time.

jmonlong avatar Sep 27 '18 06:09 jmonlong