dding3 comments

Results 14 comments of


dding3

fix caffe test path

LGTM

Orca: call spark_df_to_pd_sparkxshards() failed after joining two spark dataframes

The exception happend as XShards of dataframe doesn't expect empty dataframe for each partition. While in above code, after spark dataframe `join` operatition, the joined spark df(`merged` in the code)...

Orca: call spark_df_to_pd_sparkxshards() failed after joining two spark dataframes

BTW, we need support `merge` operation in Shards, could you add the implementation of `merge` to https://github.com/intel-analytics/BigDL/blob/main/python/orca/src/bigdl/orca/data/shard.py after it's done?

Orca: call spark_df_to_pd_sparkxshards() failed after joining two spark dataframes

> I think with default spark settings, without coalesce, we cannot guarantee each partition is non-empty.

Orca: call spark_df_to_pd_sparkxshards() failed after joining two spark dataframes

> > > > > > > > > I think with default spark settings, without coalesce, we cannot guarantee each partition is non-empty. > > But even with coalesce,...

SparkXShards to_spark_df only process the first element in one partition

For SparkXShards of pandas dataframe, it's by design there is only one pandas dataframe for each partition.

SparkXShards to_spark_df only process the first element in one partition

I think currently user can create sparkxshards of pandas dataframe in 2 ways: 1. use orca api `read_csv` which will internally create rdd of pandas df and it will create...

SparkXShards to_spark_df only process the first element in one partition

Yes, user can do that. I think in that case, if users want to create multiple pandas dataframes with one partition, these dataframes may have different schema, otherwise why mutiple...

SparkXShards to_spark_df only process the first element in one partition

Sure, will add check and report error.

If covert a spark df to shards with empty partition, will throw exception

> Why not create an empty spark dataframe partition? You mean create empty pandas df for empty spark dataframe partition? I think it may cause potential problems in the further...