hail icon indicating copy to clipboard operation
hail copied to clipboard

VCF spec does not support phased haploid calls when export chrX and chrY

Open shengqh opened this issue 4 months ago • 0 comments

What happened?

We have a dataset with joint call multi-sample VCF files (not from imputation). We converted those multi-sample VCFs to hailmatrix tables with following WDL code in Terra:

import hail as hl

hl.init(spark_conf={"spark.driver.memory": "~{memory_gb}g"})

callset = hl.import_vcf("~{source_vcf}",
                        array_elements_required=False,
                        force_bgz=True,
                        reference_genome='~{reference_genome}')

callset.write("~{target_prefix}", overwrite=True)

After sample filtering, we want to export it to VCF.

import hail as hl

hl.init(spark_conf={
    "spark.driver.memory": "~{memory_gb}g",
    "spark.local.dir": "./tmp"
  },
  tmp_dir="./tmp",
  local_tmpdir="./tmp",
  idempotent=True)
hl.default_reference("~{reference_genome}")

mt = hl.read_matrix_table("~{input_hail_mt_path}")
hl.export_vcf(mt, "~{hail_vcf}", tabix = False)

It worked on chr1 to chr22, but failed at chrX and chrY with error: VCF spec does not support phased haploid calls.

What should we do to export chrX and chrY?

Version

0.2.127-py3.11

Relevant log output

Traceback (most recent call last):
File "<stdin>", line 14, in <module>
File "<decorator-gen-1448>", line 2, in export_vcf
File "/usr/local/lib/python3.10/dist-packages/hail/typecheck/check.py", line 584, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.10/dist-packages/hail/methods/impex.py", line 634, in export_vcf
Env.backend().execute(ir.MatrixWrite(dataset._mir, writer))
File "/usr/local/lib/python3.10/dist-packages/hail/backend/backend.py", line 190, in execute
raise e.maybe_user_error(ir) from None
File "/usr/local/lib/python3.10/dist-packages/hail/backend/backend.py", line 188, in execute
result, timings = self._rpc(ActionTag.EXECUTE, payload)
File "/usr/local/lib/python3.10/dist-packages/hail/backend/py4j_backend.py", line 220, in _rpc
raise fatal_error_from_java_error_triplet(
hail.utils.java.FatalError: HailException: VCF spec does not support phased haploid calls.

Java stack trace:
is.hail.utils.HailException: VCF spec does not support phased haploid calls.
at __C83collect_distributed_array_matrix_vcf_writer.apply_region154_245(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply_region133_246(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply_region1_250(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply(Unknown Source)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$19(BackendUtils.scala:142)
at is.hail.utils.package$.using(package.scala:665)
at is.hail.annotations.RegionPool.scopedRegion(RegionPool.scala:170)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$18(BackendUtils.scala:141)
at is.hail.backend.spark.SparkBackend$$anon$5.compute(SparkBackend.scala:474)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)



Hail version: 0.2.127-bb535cd096c5
Error summary: HailException: VCF spec does not support phased haploid calls.

shengqh avatar Feb 21 '24 17:02 shengqh