glow icon indicating copy to clipboard operation
glow copied to clipboard

java.lang.ArrayIndexOutOfBoundsException when writing to vcf

Open sandra-selfdecode opened this issue 3 years ago • 1 comments

I imported vcfs from several projects and combined them into one delta table. I am now trying to write from the delta table to a vcf, and I keep getting java.lang.ArrayIndexOutOfBoundsException when it tries to write to vcf.

Can you give me suggestions for what might cause this problem? It seems to be related to the genotypes column. It works if I only select genotypes.calls.

Py4JJavaError Traceback (most recent call last) in ----> 1 extract_vcfs(kg_silver, 'collections_chr', regions='20')

in extract_vcfs(delta_path, vcf_prefix, **kwargs) 299 if out_df.count() > 0: 300 #out_df.show() --> 301 out_df.write.format('bigvcf').mode('overwrite').save(f'{vcf_prefix}{contig}.vcf.bgz') 302 303

/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options) 1134 self._jwrite.save() 1135 else: -> 1136 self._jwrite.save(path) 1137 1138 @since(1.4)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 108 def deco(*a, **kw): 109 try: --> 110 return f(*a, **kw) 111 except py4j.protocol.Py4JJavaError as e: 112 converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o541.save.

sandra-selfdecode avatar Oct 09 '21 01:10 sandra-selfdecode

Hey Sandra, not sure what the issue is! Please print the schema for the dataframe, and provide some more info about the dataset (num variants and samples), and show the code and full stacktrace please?

williambrandler avatar Oct 18 '21 19:10 williambrandler

Closing since we don't have a reproduction

henrydavidge avatar Mar 22 '24 03:03 henrydavidge