hail
hail copied to clipboard
VCF spec does not support phased haploid calls when export chrX and chrY
What happened?
We have a dataset with joint call multi-sample VCF files (not from imputation). We converted those multi-sample VCFs to hailmatrix tables with following WDL code in Terra:
import hail as hl
hl.init(spark_conf={"spark.driver.memory": "~{memory_gb}g"})
callset = hl.import_vcf("~{source_vcf}",
array_elements_required=False,
force_bgz=True,
reference_genome='~{reference_genome}')
callset.write("~{target_prefix}", overwrite=True)
After sample filtering, we want to export it to VCF.
import hail as hl
hl.init(spark_conf={
"spark.driver.memory": "~{memory_gb}g",
"spark.local.dir": "./tmp"
},
tmp_dir="./tmp",
local_tmpdir="./tmp",
idempotent=True)
hl.default_reference("~{reference_genome}")
mt = hl.read_matrix_table("~{input_hail_mt_path}")
hl.export_vcf(mt, "~{hail_vcf}", tabix = False)
It worked on chr1 to chr22, but failed at chrX and chrY with error: VCF spec does not support phased haploid calls.
What should we do to export chrX and chrY?
Version
0.2.127-py3.11
Relevant log output
Traceback (most recent call last):
File "<stdin>", line 14, in <module>
File "<decorator-gen-1448>", line 2, in export_vcf
File "/usr/local/lib/python3.10/dist-packages/hail/typecheck/check.py", line 584, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.10/dist-packages/hail/methods/impex.py", line 634, in export_vcf
Env.backend().execute(ir.MatrixWrite(dataset._mir, writer))
File "/usr/local/lib/python3.10/dist-packages/hail/backend/backend.py", line 190, in execute
raise e.maybe_user_error(ir) from None
File "/usr/local/lib/python3.10/dist-packages/hail/backend/backend.py", line 188, in execute
result, timings = self._rpc(ActionTag.EXECUTE, payload)
File "/usr/local/lib/python3.10/dist-packages/hail/backend/py4j_backend.py", line 220, in _rpc
raise fatal_error_from_java_error_triplet(
hail.utils.java.FatalError: HailException: VCF spec does not support phased haploid calls.
Java stack trace:
is.hail.utils.HailException: VCF spec does not support phased haploid calls.
at __C83collect_distributed_array_matrix_vcf_writer.apply_region154_245(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply_region133_246(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply_region1_250(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply(Unknown Source)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$19(BackendUtils.scala:142)
at is.hail.utils.package$.using(package.scala:665)
at is.hail.annotations.RegionPool.scopedRegion(RegionPool.scala:170)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$18(BackendUtils.scala:141)
at is.hail.backend.spark.SparkBackend$$anon$5.compute(SparkBackend.scala:474)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Hail version: 0.2.127-bb535cd096c5
Error summary: HailException: VCF spec does not support phased haploid calls.