hap.py
hap.py copied to clipboard
query region doesn't contain variants
Hello,
Hope you can help me! I called variants with deepvariant v1.8.0 for my region of interest, and then I wanted to bnchmark them with hap.py
<...>
# Define the reference and benchmark files
REFERENCE="hg38_MHC.fa"
BENCHMARK_BED="HG003_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed"
OUTPUT_DIR="happy_trios"
ENGINE="vcfeval"
PASS_ONLY="--pass-only"
REGION="chr1:151857524-153780694"
export APPTAINER_CACHEDIR=<...>
export APPTAINER_TMPDIR=<...>
export HGREF=<...>
mkdir -p "${OUTPUT_DIR}"
for VCF in deepvariant_output/hifi_trio_hg38/*.vcf.gz; do
SAMPLE_NAME=$(basename "${VCF}" .vcf.gz)
OUTPUT_FILE="${OUTPUT_DIR}/${SAMPLE_NAME}_comparison.v4.2.first_pass"
singularity exec docker://jmcdani20/hap.py:v0.3.12 \
/opt/hap.py/bin/hap.py \
-f "${BENCHMARK_BED}" \
-r "${REFERENCE}" \
-o "${OUTPUT_FILE}" \
--engine="${ENGINE}" \
${PASS_ONLY} \
-l "${REGION}" \
HG003_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
"${VCF}"
done
The error:
INFO: Using cached SIF image
[W] overlapping records at chr6:29747433 for sample 0
[W] Variants that overlap on the reference allele: 4
[I] Total VCF records: 4000097
[I] Non-reference VCF records: 4000097
[I] Total VCF records: 3956
[I] Non-reference VCF records: 3477
2025-03-04 17:57:41,658 WARNING starting at chr1:151857523
2025-03-04 17:57:43,707 WARNING No calls for location chr1:151857524-153780694 in query!
2025-03-04 17:57:43,707 WARNING Creating template for vcfeval. You can speed this up by supplying a SDF template that corresponds to reference_for_assembly/hg38_MHC.fa
The variants inside this region exist, all the namings are correct. This works if I just pass chr1. But I cannot understand why it does not work on a specific region.
Thank you!! Alisa