Results 14 comments of dding3

The exception happend as XShards of dataframe doesn't expect empty dataframe for each partition. While in above code, after spark dataframe `join` operatition, the joined spark df(`merged` in the code)...

BTW, we need support `merge` operation in Shards, could you add the implementation of `merge` to https://github.com/intel-analytics/BigDL/blob/main/python/orca/src/bigdl/orca/data/shard.py after it's done?

> I think with default spark settings, without coalesce, we cannot guarantee each partition is non-empty.

> > > > > > > > > I think with default spark settings, without coalesce, we cannot guarantee each partition is non-empty. > > But even with coalesce,...

For SparkXShards of pandas dataframe, it's by design there is only one pandas dataframe for each partition.

I think currently user can create sparkxshards of pandas dataframe in 2 ways: 1. use orca api `read_csv` which will internally create rdd of pandas df and it will create...

Yes, user can do that. I think in that case, if users want to create multiple pandas dataframes with one partition, these dataframes may have different schema, otherwise why mutiple...

> Why not create an empty spark dataframe partition? You mean create empty pandas df for empty spark dataframe partition? I think it may cause potential problems in the further...