Joseph Bradley

Results 18 comments of Joseph Bradley

Hi @spaszek thanks for reporting this potential improvement. I'm afraid we have very limited bandwidth to work on spark-sklearn, though it will be good to know if others need this...

When I try to build this, I'm hitting: ``` [ERROR] Failed to execute goal on project spark-tensorflow-connector_2.11: Could not resolve dependencies for project org.tensorflow:spark-tensorflow-connector_2.11:jar:1.10.0: Could not find artifact org.tensorflow:tensorflow-hadoop:jar:1.10.0 in...

Whoops, my bad, did not realize it's in the same project & is a manually handled dependency. Thanks!

Since this project's CI isn't running, I tested this PR locally. It may have some flakiness in the impl or tests right now. I ran the tests once (mvn clean...

Apologies for the slow reply! I've been bogged down with QA for the next Spark release. I wrote the code for one-hot encoding + running RFs and computing AUC. Let...

I'm glad 1 & 2 worked out! For 3, I should have been more specific. Tungsten makes improvements on DataFrames, so it should improve the performance of simple ML Pipeline...

> For 3: Yes, that was my guess too. One more question: I re-run the logistic regression https://github.com/szilard/benchm-ml/blob/master/1-linear/5-spark.txt with 1.5.0 as well and got same training time as with 1.4.0....

I didn't find anything obvious yet for issue 4, but there are still some items I want to investigate. (I haven't yet scanned the sklearn implementation carefully for comparison.) In...

Sorry for the slow response; I understand limited bandwidth! You should be able to register a new package name and then make a release by uploading the JAR built by...