Jimmy Stammers
Jimmy Stammers
The CLI for this app is incredibly useful and allows the user to carry out a lot of analysis. Is there any planned work to develop a dashboard to visualise...
## Description The implementation for `SparkHiveDataSet` allows the user to specify additional save arguments. This should enable a delta table to be saved which is done using the following pyspark...
**Describe the bug** I am failing to fit a SarimaxModel because the model reports that the exogenous and endogenous dataframes do not have the same index despite coming from the...
## Description The current implementation of `SparkHiveDataSet` contains a bug that prevents a user from specifying a save format. This issue is discussed in #1528 . ## Development notes The...
**Minimal Code To Reproduce** ```python df1_s = spark.createDataFrame([[1,2]], schema=StructType([StructField("a", IntegerType()), StructField("b",IntegerType())])) df1_s = spark.createDataFrame([[1,2]], schema=StructType([StructField("c", IntegerType()), StructField("d",IntegerType())])) dag= FugueWorkflow() df1 = dag.df(df1_s) df2 = dag.df(df1_s) df2 = df1.join(df2, how='cross') dag.run(engine='spark')...
I have a function that aims to implement an SCD2 merge on two dataframes. In my example, I am attempting to merge two dataframes together, using a single column as...
**Minimal Code To Reproduce** **Describe the bug** I have a set of unit tests that check the functionality of code that uses the `fugue_sql` API with a DuckDB backend. When...
### Is your feature request related to a problem? Pyspark now supports [Arrow UDFs](https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html#arrow-python-udfs) that facilitate efficient row-by-row executions using Arrow as a backend e.g. ```python import pandas as pd...
### What happened? I'm trying to create a table with a `geometry` column using given lat/long coodinates. I can successfully create this column, but when calling `.cache()`, I get a...