glow
glow copied to clipboard
An open-source toolkit for large-scale genomic analysis
From https://github.com/projectglow/glow/blob/master/docs/source/tertiary/pipe-transformer.rst: > Options beginning with env_ are interpreted as environment variables. Like other options, the environment variable name is converted to lower snake case. For example, providing the option...
This issue is similar to the issues reported in SPARK-21996 and SPARK-23148. Would [this](https://github.com/projectglow/glow/blob/v0.6.0/core/src/main/scala/io/projectglow/vcf/VCFFileFormat.scala#L219) line need to be modified to `val hPath = new Path(new URI(path))`? (probably also other places...
Signed-off-by: William Brandler ## What changes are proposed in this pull request? VCF reader does not support special characters such as whitespaces, but json and csv datasource readers do. Right...
From the [docs](https://glow.readthedocs.io/en/latest/etl/variant-data.html#vcf): > For the sharded VCF writer, the sample IDs are inferred from the first row of each partition and must be the same for each row. If...
Super useful to clean up malformed VCFs. Example: ```python default_vcf_schema = t.StructType([ t.StructField("contigName", t.StringType()), t.StructField("start", t.LongType()), t.StructField("end", t.LongType()), t.StructField("names", t.ArrayType(t.StringType())), t.StructField("referenceAllele", t.StringType()), t.StructField("alternateAlleles", t.ArrayType(t.StringType())), t.StructField("qual", t.DoubleType()), t.StructField("filters", t.ArrayType(t.StringType())), t.StructField("splitFromMultiAllelic", t.BooleanType()),...
The new quarantine functionality in the pipe transformer will successfully run if there are corrupted records, however, As an example, I created an input dataframe of 961 rows. The expected...
Hi, Is there a plink demo? I'm looking at the sample notebook provided on the documentation page, but I'm not seeing anything for loading in and displaying a plink file....
Spark jobs with lots of partitions can crash if the driver is too small, for example with the error, `Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size...
The GLOW Documentation generally contains the demonstration of the GLOW functionality using python. The same holds for the talk conducted by Mr Amir Kermany and Mr Kiavash Kianfa, which was...
This issue occurs if the phenotype data isn’t indexed to the sample id `phenotypes = pd.read_csv(quantitative_phenotypes_path, dtype={'sample_id': str}, index_col='sample_id')` Ideally, throw an exception if no samples are found