glow icon indicating copy to clipboard operation
glow copied to clipboard

nested NullType in genotypes produced in Hail interoperability function

Open williambrandler opened this issue 4 years ago • 6 comments

Converting a Hail matrix table to glow works fine, but when you try to write out or perform downstream glow functions on the data it fails,

import hail as hl
hl.init(sc, idempotent=True, quiet=True)
vcf_path = '/mnt/hail/data-001/1kg_sample.vcf.bgz'
vcf_mt = hl.import_vcf(vcf_path)
mt = hl.balding_nichols_model(n_populations=3,
                              n_samples=10,
                              n_variants=10)
df = functions.from_matrix_table(mt, include_sample_ids=True)
display(df)

df.write.format("delta").save("dbfs:/tmp/test.delta") FatalError: AnalysisException: Found nested NullType in column names which is of ArrayType. Delta doesn't support writing NullType in complex types.

df2 = df.withColumn('values', glow.mean_substitute(glow.genotype_states(col('genotypes')))) FatalError: AnalysisException: unresolved operator 'Project [contigName#2670, start#2671L, end#2672L, names#2673, referenceAllele#2674, alternateAlleles#2675, genotypes#2676, meansubstitute(genotypestates(genotypes#2676, None), -1) AS values#2702];

Note: I do not see any NullType or missing values in the dataframe, what is going on?

williambrandler avatar Apr 29 '21 19:04 williambrandler

.drop("names") removes the NullType Error, but the "unresolved operator" error when calling mean_substitute remains.

bcajes avatar Apr 29 '21 20:04 bcajes

Hey Brian, yeah dropping "names" also worked for me. Then write to delta, read back in, and it works.Screen Shot 2021-04-29 at 1 49 47 PM

williambrandler avatar Apr 29 '21 20:04 williambrandler

It looks like there are some unexpected interactions between Hail and Glow that interfere with the registered functions. As a workaround, it may be best to avoid using Hail and Glow together. As @williambrandler demonstrated, it seems to be safer to immediately checkpoint after extraction.

karenfeng avatar Apr 29 '21 21:04 karenfeng

this will be fixed by https://github.com/projectglow/glow/pull/377 in glow v1.1.0

williambrandler avatar Sep 07 '21 23:09 williambrandler

glow v1.1.0 has been released, so closing this thread

williambrandler avatar Sep 17 '21 17:09 williambrandler

note: if you still get

AnalysisException: Undefined function: 'nullif'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 0

please only run export from hail to glow on a separate cluster

williambrandler avatar Feb 24 '22 21:02 williambrandler