gatk icon indicating copy to clipboard operation
gatk copied to clipboard

GCNV: AssertionError: Loaded mean for "log_mean_bias_t" has an unexpected shape; loaded: (11903,), expected: (11901,)

Open stefandiederich opened this issue 11 months ago • 0 comments

Hi all,

I am using CNV detection with GATK v4.3.0.0 for quite a while very successfully. Now we changed the enrichment kit and I had to do a new model. Everything worked well for the model phase.

As I now run one sample against this model I got the following error at the CNV detection step:

Using GATK jar /usr/BioinfSoftware/GATK/4.3.0.0/gatk-package-4.3.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/BioinfSoftware/GATK/4.3.0.0/gatk-package-4.3.0.0-local.jar GermlineCNVCaller --run-mode CASE -contig-ploidy-calls /media/Ergebnisse/0115-24_Masterpanel_NB501654_0623/0115-24_DGCP_noProbe-calls/ --model /media/Data/MasterV3/GCNV_noProbe-model/ --input /media/Ergebnisse/0115-24_Masterpanel_NB501654_0623/0115-24_noProbe.hdf5 --output /media/Ergebnisse/0115-24_Masterpanel_NB501654_0623/ --output-prefix 0115-24_GCNV_noProbe --tmp-dir /media/Data/tmp/
10:20:01.611 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/BioinfSoftware/GATK/4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:20:01.717 INFO  GermlineCNVCaller - ------------------------------------------------------------
10:20:01.718 INFO  GermlineCNVCaller - The Genome Analysis Toolkit (GATK) v4.3.0.0
10:20:01.718 INFO  GermlineCNVCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
10:20:01.718 INFO  GermlineCNVCaller - Executing as die9s@k-hg-srv3 on Linux v5.3.18-24.37-default amd64
10:20:01.718 INFO  GermlineCNVCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.11+9-suse-3.56.1-x8664
10:20:01.718 INFO  GermlineCNVCaller - Start Date/Time: March 14, 2024 at 10:20:01 AM CET
10:20:01.718 INFO  GermlineCNVCaller - ------------------------------------------------------------
10:20:01.718 INFO  GermlineCNVCaller - ------------------------------------------------------------
10:20:01.719 INFO  GermlineCNVCaller - HTSJDK Version: 3.0.1
10:20:01.719 INFO  GermlineCNVCaller - Picard Version: 2.27.5
10:20:01.719 INFO  GermlineCNVCaller - Built for Spark Version: 2.4.5
10:20:01.719 INFO  GermlineCNVCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:20:01.719 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:20:01.719 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:20:01.719 INFO  GermlineCNVCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:20:01.719 INFO  GermlineCNVCaller - Deflater: IntelDeflater
10:20:01.719 INFO  GermlineCNVCaller - Inflater: IntelInflater
10:20:01.719 INFO  GermlineCNVCaller - GCS max retries/reopens: 20
10:20:01.719 INFO  GermlineCNVCaller - Requester pays: disabled
10:20:01.720 INFO  GermlineCNVCaller - Initializing engine
10:20:07.111 INFO  GermlineCNVCaller - Done initializing engine
10:20:07.207 INFO  GermlineCNVCaller - Running the tool in CASE mode...
10:20:07.207 INFO  GermlineCNVCaller - Validating and aggregating data from input read-count files...
10:20:07.231 INFO  GermlineCNVCaller - Aggregating read-count file /media/Ergebnisse/0115-24_Masterpanel_NB501654_0623/0115-24_noProbe.hdf5 (1 / 1)
log4j:WARN No appenders could be found for logger (org.broadinstitute.hdf5.HDF5Library).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
10:20:25.874 INFO  GermlineCNVCaller - Shutting down engine
[March 14, 2024 at 10:20:25 AM CET] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 0.40 minutes.
Runtime.totalMemory()=2147483648
org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
python exited with 1
Command Line: python /media/Data/tmp/case_denoising_calling.3564509013495540802.py --ploidy_calls_path=/media/Ergebnisse/0115-24_Masterpanel_NB501654_0623/0115-24_DGCP_noProbe-calls --output_calls_path=/media/Ergebnisse/0115-24_Masterpanel_NB501654_0623/0115-24_GCNV_noProbe-calls --output_tracking_path=/media/Ergebnisse/0115-24_Masterpanel_NB501654_0623/0115-24_GCNV_noProbe-tracking --input_model_path=/media/Data/MasterV3/GCNV_noProbe-model --random_seed=1984 --read_count_tsv_files /media/Data/tmp/0115-24.rc16220482177493702615.tsv --psi_s_scale=1.000000e-04 --mapping_error_rate=1.000000e-02 --depth_correction_tau=1.000000e+04 --q_c_expectation_mode=hybrid --num_samples_copy_ratio_approx=200 --p_alt=1.000000e-06 --cnv_coherence_length=1.000000e+04 --max_copy_number=5 --learning_rate=1.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.900000e-01 --log_emission_samples_per_round=50 --log_emission_sampling_rounds=10 --log_emission_sampling_median_rel_error=5.000000e-03 --max_advi_iter_first_epoch=5000 --max_advi_iter_subsequent_epochs=200 --min_training_epochs=10 --max_training_epochs=50 --initial_temperature=1.500000e+00 --num_thermal_advi_iters=2500 --convergence_snr_averaging_window=500 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=10 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=1.000000e+00 --disable_caller=false --disable_sampler=false --disable_annealing=false
Stdout: 10:20:12.111 INFO case_denoising_calling - THEANO_FLAGS environment variable has been set to: device=cpu,floatX=float64,optimizer=fast_run,compute_test_value=ignore,openmp=true,blas.ldflags=-lmkl_rt,openmp_elemwise_minsize=10
10:20:12.273 INFO root - Loading modeling interval list from the provided model...
10:20:12.475 INFO gcnvkernel.io.io_intervals_and_counts - The given interval list provides the following interval annotations: {'GC_CONTENT'}
10:20:12.491 INFO root - The model contains 11901 intervals and 23 contig(s)
10:20:12.491 INFO root - Loading 1 read counts file(s)...
10:20:12.545 INFO gcnvkernel.io.io_metadata - Loading germline contig ploidy and global read depth metadata...
10:20:12.554 INFO root - Loading denoising model configuration from the provided model...
10:20:12.555 INFO root - - bias factors enabled: True
10:20:12.555 INFO root - - explicit GC bias modeling enabled: True
10:20:12.555 INFO root - - bias factors in active classes disabled: False
10:20:12.555 INFO root - - maximum number of bias factors: 5
10:20:12.555 INFO root - - number of GC curve knobs: 20
10:20:12.555 INFO root - - GC curve prior standard deviation: 1.0
10:20:12.954 INFO gcnvkernel.tasks.task_case_denoising_calling - Instantiating the denoising model...
10:20:15.806 INFO gcnvkernel.tasks.task_case_denoising_calling - Instantiating the sampler...
10:20:15.807 INFO gcnvkernel.tasks.task_case_denoising_calling - Instantiating the copy number caller...
10:20:18.549 INFO gcnvkernel.models.fancy_model - Global model variables: {'log_mean_bias_t', 'psi_t_log__', 'W_tu', 'ard_u_log__'}
10:20:18.549 INFO gcnvkernel.models.fancy_model - Sample-specific model variables: {'read_depth_s_log__', 'psi_s_log__', 'z_sg', 'z_su'}
10:20:18.549 INFO gcnvkernel.tasks.inference_task_base - Instantiating the convergence tracker...
10:20:18.549 INFO gcnvkernel.tasks.inference_task_base - Setting up DA-ADVI...
10:20:24.995 INFO gcnvkernel.tasks.task_case_denoising_calling - Loading the model and updating the instantiated model and workspace...
10:20:25.005 INFO gcnvkernel.io.io_commons - Reading model parameter values for "log_mean_bias_t"...

Stderr: Traceback (most recent call last):
  File "/media/Data/tmp/case_denoising_calling.3564509013495540802.py", line 201, in <module>
    shared_workspace, initial_params_supplier, args.input_model_path)
  File "/usr/BioinfSoftware/Anaconda/3-2020.11/envs/gatk4.3.0.0/lib/python3.6/site-packages/gcnvkernel/tasks/task_case_denoising_calling.py", line 128, in __init__
    self.continuous_model_approx, input_model_path)()
  File "/usr/BioinfSoftware/Anaconda/3-2020.11/envs/gatk4.3.0.0/lib/python3.6/site-packages/gcnvkernel/io/io_denoising_calling.py", line 93, in __call__
    self.input_path, self.denoising_model_approx, self.denoising_model)
  File "/usr/BioinfSoftware/Anaconda/3-2020.11/envs/gatk4.3.0.0/lib/python3.6/site-packages/gcnvkernel/io/io_commons.py", line 471, in read_mean_field_global_params
    "expected: {2}".format(var_name, var_mu.shape, vmap.shp)
AssertionError: Loaded mean for "log_mean_bias_t" has an unexpected shape; loaded: (11903,), expected: (11901,)

        at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
        at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:351)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289) ```

Can you give me some hint where this error comes from? 
Thanks in advanve
Stefan 

stefandiederich avatar Mar 14 '24 08:03 stefandiederich