Han Wang
Han Wang
I also modified your code a little bit to follow good practices: ```python import pandas as pd from fugue import FugueWorkflow _df1 = pd.DataFrame({"a": [1, 2, 3], "b": [1, None,...
Wonderful! When you test on spark, don't use `SparkExecutionEngine`, you should just use the spark session: ```python dag.run(spark_session) ``` `spark.sql.shuffle.partitions` should be set properly (this applies to general spark execution)...
> My two cents for the problem raised from the mutability of Pandas and the immutability of Spark/DuckDB: I had the chance to see only a little bit of Fugue's...
@lukeb88 I think it is very well said, and it also aligned with Fugue's priority: consistency is more important than performance. I will create a PR to make the change,...
It's dependent on https://github.com/fugue-project/triad/pull/90
The problem is your code has typos, you defined `df1_s` twice and `df2` is also from `df1_s` This works ```python df1_s = spark.createDataFrame([[1,2]], schema=StructType([StructField("a", IntegerType()), StructField("b",IntegerType())])) df2_s = spark.createDataFrame([[1,2]], schema=StructType([StructField("c",...
Also, with Fugue, you really don't need to create spark df with the tedious schema expressions, you can just use pandas dfs ```python dag = FugueWorkflow() df1 = dag.df(pd.DataFrame(...)) df2...
@vspinu @jmoralez I just want to reiterate some of the facts: 1. Fugue is the core part of statsforecast to make the lib run seamlessly on different distributed environment 2....
@sankarsiva123 @jmoralez In a few weeks, we will release fugue 0.9.0 whose core dependency (out of the fugue-project) is only pandas and pyarrow. Fugue integration is supposed to handle all...
> Thanks @goodwanghan and @kvnkho for your impressive efforts and extensive explanation! Looking forward to 0.9.0. @vspinu we have released the new version of fugue-sql-antlr yesterday, if you upgrade it...