Lee Yang

TensorFlowOnSpark, NVIDIA, Yahoo

Results 21 comments of


                                            Lee Yang

do we support scala & java code write tensorflow model with tenorflow-core-api ?

Unfortunately, Java API support in TF has been spotty with [deprecation warnings](https://www.tensorflow.org/api_docs/java/org/tensorflow/package-summary) and [no API stability guarantees](https://www.tensorflow.org/jvm/install). We initially tried to support Java when the API was updated regularly with...

Evalator hangs while training

I don't see anything obvious from your logs. Given that it looks like the evaluator process stalled/quit, I'd check for CPU and memory usage on that node (when it's running)...

How to integrate a model into Spark cluster

If you already have a trained model (and it fits in memory), then the simplest way to run inferencing in a Spark job is to use something like [this example](https://github.com/yahoo/TensorFlowOnSpark/blob/master/examples/mnist/keras/mnist_inference.py)....

How to integrate a model into Spark cluster

@jiqiujia assuming that your model won't change over the course of the job, you can just cache the model in the python worker processes via a global variable. Just check...

How to integrate a model into Spark cluster

@jahidhasanlinix Not quite sure what you're doing here... *.pt are PyTorch models. Have you converted a TensorFlow model to PyTorch (or vice versa)?

How to integrate a model into Spark cluster

@jahidhasanlinix Unfortunately, I think that code looks like it's beyond the scope of what TFoS is trying to do. Decima presumably integrates with (or replaces) the spark scheduler itself, while...

Get stuck at "Added broadcast_0_piece0 in memory on" while runing Spark standalone cluster

My guess is that your `model_dir` needs to fully-specify the HDFS path, e.g. `hdfs://default/...` Note that the example uses spark-local mode (for simplicity).

[SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

> Pls also fix the linter failure: https://github.com/leewyang/spark/actions/runs/3397174449/jobs/5649073867#step:16:71 Updated to latest master, which got rid of the linter error, but it added a new "appveyor" check, which seems to be...

[SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

BTW, I'm seeing a change in behavior in the `pandas_udf` when used with `limit` in the latest master branch of spark (vs. 3.3.1), per this example code: ``` import numpy...

[SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

@WeichenXu123 Yes, using `df.limit(10).cache().withColumn` makes it only process 10 rows inside the pandas_udf, which addresses the performance issue, thanks!

‹
1
2
3
›