spark-sklearn issues

Long Time to Collect Results of Distributed Spark-Sklearn Training

1

I'm running 15 combinations of a Logistic Regression model with spark-sklearn and I'll see that all tasks have completed but there is a huge amount of time to collect all...

wjohnson

Scikit >=20.0 support

5

Is there any plan to support scikit-learn >=20.0?

yishilin14

test_scipy_sparse (spark_sklearn.converter_test.CSRVectorUDTTests) failure

Getting this test failure: ``` (spark_sklearn.converter_test.CSRVectorUDTTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/stoker/spark-sklearn/python/spark_sklearn/converter_test.py", line 83, in test_scipy_sparse self.assertEqual(df.count(), 1) File "/usr/local/spark/python/pyspark/sql/dataframe.py", line 522, in count return int(self._jdf.count()) File "/usr/local/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",...

austinwiltshire

best_params_ missing on GridSearchCV

3

The best_params_ dict seems to be missing from GridSearchCV, even if refitting is enabled. [grid_search.py#L195](https://github.com/databricks/spark-sklearn/blob/master/python/spark_sklearn/grid_search.py#L195) refers to that parameter, it is determined in [grid_search.py#L371](https://github.com/databricks/spark-sklearn/blob/master/python/spark_sklearn/grid_search.py#L371) but never actually exposed after fitting....

dklischies

bug

AttributeError: 'KeyedEstimator' object has no attribute '_input_kwargs'

When I was using the function 'KeyedEstimator(sklearnEstimator=LinearRegression(), yCol="y")', a error as the title occured. The verison of sklearn (0.19.2) meets the requirements. So why? Thank you.

logicdj

best_params_ not supported by RandomizedSearchCV()

1

The documentation for RandomizedSearchCV implies that a best_params_ property is available after .fit() is called. This does not appear to be the case. Here is the documentation in question: https://github.com/databricks/spark-sklearn/blob/master/python/spark_sklearn/random_search.py#L162...

shaunswanson

Clarify RandomizedSearchCV documentation for sampling with replacement

2

At this line, it may be better to explicitly mention which parameters will be sampled with replacement if any one of them is a distribution: https://github.com/databricks/spark-sklearn/blob/master/python/spark_sklearn/random_search.py#L27 Are all parameters (those...

shaunswanson

Use generate sklearn UDT within gapply() [SPARK-16062 blocks this]

3

Currently, KeyedModel fitting in KeyedEstimator._fit is implemented by generating an array of a single serialized estimator, requiring an additional pass over the resulting dataframe which deserializes the UDT. This is...

vlad17

enhancement

Update to latest scikit-learn release for deprecation and compatibility

12

Using the current head 0.2.0 release of spark-sklearn and the current release of scikit-learn (0.18.1), I'm getting the following deprecation warning: /.../python3.4/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18...

dsackin

enhancement

[WIP] Converts dataframe to/from named numpy arrays

4

I found this incredibly convenient to create small dataframes, here is how you can use it: ``` python n = 5 A = rd.rand(n,4) C = rd.randint(10, size=n) df =...

thunterdb

enhancement

spark-sklearn
spark-sklearn copied to clipboard

Metadata

Long Time to Collect Results of Distributed Spark-Sklearn Training

Scikit >=20.0 support

test_scipy_sparse (spark_sklearn.converter_test.CSRVectorUDTTests) failure

best_params_ missing on GridSearchCV

AttributeError: 'KeyedEstimator' object has no attribute '_input_kwargs'

best_params_ not supported by RandomizedSearchCV()

Clarify RandomizedSearchCV documentation for sampling with replacement

Use generate sklearn UDT within gapply() [SPARK-16062 blocks this]

Update to latest scikit-learn release for deprecation and compatibility

[WIP] Converts dataframe to/from named numpy arrays

← Metadata

Owner

Metadata

spark-sklearn spark-sklearn copied to clipboard

Metadata

← Metadata

Owner

Metadata

spark-sklearn
spark-sklearn copied to clipboard