williambrandler

Results 19 issues of williambrandler

Signed-off-by: William Brandler Moving the Databricks documentation for Hail to the open source Hail documentation. This project is not part of the Databricks product, and therefore instructions for using the...

Hey, I was wondering when Hail will be on Spark 3.2?

triaged

Signed-off-by: William Brandler ## What changes are proposed in this pull request? Adding in the GATK to the glow docker container was having some issues building the glow docker container...

Converting a Hail matrix table to glow works fine, but when you try to write out or perform downstream glow functions on the data it fails, ``` import hail as...

Signed-off-by: William Brandler ## What changes are proposed in this pull request? VCF reader does not support special characters such as whitespaces, but json and csv datasource readers do. Right...

The new quarantine functionality in the pipe transformer will successfully run if there are corrupted records, however, As an example, I created an input dataframe of 961 rows. The expected...

Spark jobs with lots of partitions can crash if the driver is too small, for example with the error, `Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size...

This issue occurs if the phenotype data isn’t indexed to the sample id `phenotypes = pd.read_csv(quantitative_phenotypes_path, dtype={'sample_id': str}, index_col='sample_id')` Ideally, throw an exception if no samples are found

If you get this error when exporting from Hail to Glow `df = functions.from_matrix_table(mt, include_sample_ids=True)` `AnalysisException: Undefined function: 'nullif'. This function is neither a registered temporary function nor a permanent...

The build process with sonatype is a manual process. The `sbt release` build can take 3+ hours then fail if everything is not set up perfectly. At which point you...