dding3
dding3
LGTM
The exception happend as XShards of dataframe doesn't expect empty dataframe for each partition. While in above code, after spark dataframe `join` operatition, the joined spark df(`merged` in the code)...
BTW, we need support `merge` operation in Shards, could you add the implementation of `merge` to https://github.com/intel-analytics/BigDL/blob/main/python/orca/src/bigdl/orca/data/shard.py after it's done?
> I think with default spark settings, without coalesce, we cannot guarantee each partition is non-empty.
> > > > > > > > > I think with default spark settings, without coalesce, we cannot guarantee each partition is non-empty. > > But even with coalesce,...
For SparkXShards of pandas dataframe, it's by design there is only one pandas dataframe for each partition.
I think currently user can create sparkxshards of pandas dataframe in 2 ways: 1. use orca api `read_csv` which will internally create rdd of pandas df and it will create...
Yes, user can do that. I think in that case, if users want to create multiple pandas dataframes with one partition, these dataframes may have different schema, otherwise why mutiple...
Sure, will add check and report error.
> Why not create an empty spark dataframe partition? You mean create empty pandas df for empty spark dataframe partition? I think it may cause potential problems in the further...