Kai Huang comments

Results 136 comments of


                                            Kai Huang

Memory optimization for from spark Dataframe to SparkXShards

For recsys dataset, if I only save some selected features processed here: https://github.com/analytics-zoo/friesian/blob/recsys-challenge/recsys2021/demo/final/scripts/preprocess.py The total parquet size is around 3.9G and when I read parquet I get 200+ partitions. Then...

New operations for table.py

iloc and append_list may be hard to work for big data, pending on this. Let's put shift in a new PR and merge it first. cc @jason-dai

Support class_weight and sample_weight in tfpark KerasModel.fit

Seems class_weight is the same as the option in ClassNLLCriterion, but sample_weight can assign weight to each input sample, which is not supported.

Reference steps to migrate zoo repo to bigdl-2.0

Tips: If you get stuck when `sudo add-apt-repository ppa:git-core/ppa` for upgrading git, export http_proxy & https_proxy, and add `Defaults env_keep="https_proxy"` to then end of /etc/sudoers file. https://askubuntu.com/questions/212132/i-cant-add-ppa-repository-behind-the-proxy

Reference steps to migrate zoo repo to bigdl-2.0

``` # change license grep -rl '2018\ Analytics\ Zoo' . |xargs sed -i 's/2018\ Analytics\ Zoo/2016\ The\ BigDL/g' ``` Don't know why after using the above command to modify the...

(raylet) socket.gaierror: [Errno -2] Name or service not known

Hi @xunaichao I checked the code and run it on Google Colab, I can get this error as well. But seems this error doesn't impact or interrupt the running, you...

tf2 estimator failed with horovod backend if data_creator is tf.data.Dataset from generator

@leonardozcm Can you write the installation steps here and @jenniew follow this steps to further verify on a new environment.

How to use Analytics-zoo when SparkSession is automatically instantiated

We are working on this and would finish it very soon.

How to use Analytics-zoo when SparkSession is automatically instantiated

Hi @guidiandrea Since you already have a SparkSession, you need to manually upload the jar for Analytics Zoo before initializing the SparkSession. You may refer to our guide for DataBricks...

How to use Analytics-zoo when SparkSession is automatically instantiated

But actually pip install analytics-zoo will also install bigdl and pyspark2.4.6, how can you only pip install analytics-zoo? If you are using Spark 2.3, I suppose you may need to...