featuretools
featuretools copied to clipboard
"WindowExec: No Partition Defined for Window operation!" warnings on Spark EntitySets
Both when I add Spark DataFrames to my EntitySet and when I call .dfs()
on the Spark EntitySet, I see a flood of warnings:
22/04/26 16:41:09 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
It's even shown in the official documentation: https://featuretools.alteryx.com/en/stable/guides/using_spark_entitysets.html#Running-DFS
Is this an indication of some fundamental scaling issues when using Spark, or can I safely ignore it? What is the root cause of the warning?
Output of featuretools.show_info()
Featuretools version: 1.8.0
SYSTEM INFO
python: 3.9.11.final.0 python-bits: 64 OS: Darwin OS-release: 21.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
INSTALLED VERSIONS
numpy: 1.22.3 pandas: 1.3.5 tqdm: 4.64.0 cloudpickle: 2.0.0 dask: 2022.4.1 distributed: 2022.4.1 psutil: 5.9.0 pip: 22.0.3 setuptools: 60.6.0
Hi @nicodv , we will look into this and get back to you