Evgenii Ignatev
Evgenii Ignatev
Created a PR to demonstrate my suggestion - https://github.com/snowflakedb/spark-snowflake/pull/252
@sfc-gh-zli Hello, Actual code is quite large and interconnected, looks like can be outlined as roughly: `col_vals_set = set() # Set can be computed empty depending on the previous code.`...
Also in my example empty set is used directly, not list.
Also it is currently not clear how `cluster.max-partitions` and `ids.num-partitions` are correlated and this topic is not covered by docs.
@MrPowers Small proposal - maybe adding UUID5 (not as complete as Python version obviously, but better than nothing) generator? - https://github.com/YevIgn/pyspark-uuid5/blob/2055a4aa8429424ef79c248f78aba2a33e462806/src/research_udf_performance.py#L158 - recently I made an attempt to write one,...