canvas
canvas copied to clipboard
Demo evaluation needs correction
In the demo's Evaluation section, the command:
zcat /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf (remove REF calls)
/CanvasDIR/Tools/EvaluateCNV/EvaluateCNV.dll /ihart/BaseSpace/Projects/CanvasSPW/AppResults/simdata/Files/child1_truth.bed /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf /CanvasDIR/Tools/EvaluateCNV/generic.cnaqc.excluded_regions.bed inheritedCNVs.txt
would not run since the path to generic.cnaqc.excluded_regions.bed
is wrong, and also for consistency, CanvasSPW
should be renamed to canvas
. And it's better to comment out the (remove REF calls)
part. So in the end it would be something like this:
zcat /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf #(remove REF calls)
/CanvasDIR/Tools/EvaluateCNV/EvaluateCNV.dll /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed /tmp/gHapMixDemo/TempCNV_child1/CNV.vcf /tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed inheritedCNVs.txt
But in the end it still crashes saying that I need to provide reference ploidy
...
2019-04-23T09:47:57+01:00,ERROR: Exception caught in WorkDoerFactory. Cancelling all jobs. Exception:
Error: Truth variant chr6:105256020-105271607 with no overlapping Canvas calls. Reference ploidy cannot be determined! Please provide reference ploidy via command line options
...
Yes, the demo documentation is outdated. Sorry about that. I will keep this issue open so others can see the workaround. For reference ploidy vcf input see this post: https://github.com/Illumina/canvas/issues/89#issuecomment-400762109
Thank you for your reply. After some research and trials / errors, I still fail. This is the code I ran:
zcat output/demo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > output/demo/TempCNV_child1/CNV.vcf #(remove REF calls)
dotnet /canvasdir/Tools/EvaluateCNV/EvaluateCNV.dll \
/tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed \
output/demo/TempCNV_child1/CNV.vcf \
/tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed \
inheritedCNVs.txt \
--ploidy=1 1 data/Files/par.bed
par.bed
being
chrX 60001 2699520
chrX 154931044 155260560
chrY 10001 2649520
chrY 59034050 59363566
Error being
2019-04-24T12:20:45+01:00,ERROR: Exception caught in WorkDoerFactory. Cancelling all jobs. Exception:
Value cannot be null.
Parameter name: fileName
System.ArgumentNullException: Value cannot be null.
Parameter name: fileName
at System.IO.FileInfo..ctor(String originalPath, String fullPath, String fileName, Boolean isNormalized)
at EvaluateCNV.CNVChecker.ComputeCallability(ILogger logger, Dictionary`2 callsByContig, EvaluateCnvOptions options, IDirectoryLocation output) in D:\TeamCity\buildAgent\work\a2$a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 543
at EvaluateCNV.CNVChecker.<>c__DisplayClass24_0.<Evaluate>b__4(IWorkDoer workDoer) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 536
at Isas.Framework.WorkManagement.JobLaunching.JobLauncherFactory.RunWithJobLauncher(ILogger logger, ISettings settings, IDirectoryLocation loggingDir, Action`1 logCommand, Cance$lationToken cancellationToken, Action`1 function)
at Isas.Framework.WorkManagement.JobLaunching.JobLauncherFactory.RunWithJobLauncher(ILogger logger, ISettings settings, IDirectoryLocation analysisFolder, CancellationToken canc$llationToken, Action`1 function)
at Isas.Framework.WorkManagement.ResourceManagement.WorkResourceManagerFactory.RunWithResourceManager(ILogger logger, ISettings settings, CancellationToken cancellationToken, Ac$ion`1 function)
at Isas.Framework.WorkManagement.WorkDoerFactory.RunWithWorkDoer(ILogger logger, ISettings settings, IDirectoryLocation analysisFolder, CancellationTokenSource cancellationToken$ource, Action`1 function)
at EvaluateCNV.CNVChecker.Evaluate(String truthSetPath, String cnvCallsPath, String excludedBed, String outputPath, EvaluateCnvOptions options) in D:\TeamCity\buildAgent\work\a29
a190a11771d97\Tools\EvaluateCNV\CNVChecker.cs:line 538
at EvaluateCNV.Program.MainHelper(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\Program.cs:line 49
at EvaluateCNV.Program.Main(String[] args) in D:\TeamCity\buildAgent\work\a29a190a11771d97\Tools\EvaluateCNV\Program.cs:line 16
Any idea?
I figured out that I needed to provide kmer.fa
. And since it infers the (wrong) location of GenomeSize.xml
, I needed to soft link some of the files such as kmer.fa
and filter13.bed
.
zcat output/demo/TempCNV_child1/CNV.vcf.gz | grep -v ":REF:" > output/demo/TempCNV_child1/CNV.vcf #(remove REF calls)
dotnet /canvasdir/Tools/EvaluateCNV/EvaluateCNV.dll \
/tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/child1_truth.bed \
output/demo/TempCNV_child1/CNV.vcf \
/tmp/BaseSpace/Projects/canvas/AppResults/simdata/Files/generic.cnaqc.excluded_regions.bed \
inheritedCNVs.txt \
--ploidy=1 1 data/Files/ploidy.bed \
-k=data/canvasdata/Files/kmer.fa
This command works with no errors, and outputs the following as part of the result:
Ploidy 1.86
Results for PASSing variants
Accuracy 39.7608
DirectionAccuracy 40.1665
F-score 0.8575
Recall 77.7004
DirectionRecall 78.4933
Precision 95.6493
DirectionPrecision 96.6254
GainRecall 70.6110
GainDirectionRecall 71.4076
GainPrecision 91.2464
GainDirectionPrecision 92.2757
LossRecall 80.0021
LossDirectionRecall 80.0021
LossPrecision 96.9904
LossDirectionPrecision 97.9502
MeanEventAccuracy 68.7341
MedianEventAccuracy 94.5666
VariantEventsCalled 2133
VariantBasesCalled 219903552
...
The recall rate is a bit far off from the documentation, though there are warnings in the stderr that might be related, such as that it failed to locate PARv5.bed
, and one of the chrY calls has GT as 1/1:...
instead of 1:...
.
Any ideas?
There are no truth events on chrX for that sample so the PAR calls will not affect recall. The lower recall number you are seeing is probably just a limitation in the truth set for that simulated dataset. ~80% recall is typical for a germline sample.
PARv5.bed files attached PARv5.bed.hg19.txt PARv5.bed.grch38.txt PARv5.bed.grch37.txt
Thank you @eroller ! I guess the demo run can be deemed a success.