Jason Dai

Results 106 comments of Jason Dai

You may refer to the Pandas UDF implementations in Spark for using arrow for spark df and pandas df conversion.

> > > Xshards now does not support 1) shuffle dataframe, 2) astype (data type change), 3) train_test_split, 4) duplicate whole dataframe according to one column. > > > >...

See https://github.com/intel-analytics/BigDL/issues/4965#issuecomment-1184515330

I think we should take a list of ndarray as input (e.g., for xshards)? @yushan111 @sgwhat

Create a simple but meaningful python project (e.g., multiple python source files), and use that project as a walking example in the tutorial.

At the beginning of the tutorial (before the table of contents), we need a very short (one or two sentences) description that talks about what BigDL PPML is from the...

Why do we repartition previously? @jenniew

> Why do we must have the number of partitions equal to the number of workers? Repartition is expensive, if the number of partitions is already larger than the number...