spark-sklearn [WIP] Converts dataframe to/from named numpy arrays

[WIP] Converts dataframe to/from named numpy arrays

Open thunterdb opened this issue 10 years ago • 4 comments

I found this incredibly convenient to create small dataframes, here is how you can use it:

n = 5
A = rd.rand(n,4)
C = rd.randint(10, size=n)
df = conv.pack_DataFrame(a=A, c=C)

DataFrame[a: vector, c: bigint]

And the other conversion. It properly extracts the proper shape for vectors, matrices, etc.

Z = Converter.df_to_numpy(df)
# Each column is strictly equal to the original.
Z['a'] == A
Z['c'] == C

Currently missing are more tests, better names, and sparse vectors. Not sure how easy it is to support these because they have an irregular shape between row. It is probably easier to prevent it and force users to use the CSC conversion that you already wrote.

Dec 02 '15 00:12 thunterdb

Just had a couple more comments.

Dec 16 '15 18:12 jkbradley

@jkbradley comments addressed

Dec 21 '15 21:12 thunterdb

This PR shoul unskip the following: test_cv_lasso_with_mllib_featurization (spark_sklearn.tests.test_grid_search_2.CVTests) ... SKIP: disable this test until we have numpy <-> dataframe conversion

Jun 28 '16 00:06 vlad17

I'm starting to look through the open PRs to see if we can merge them or whether they're stale -- @thunterdb is this one too old to resurrect?

Dec 07 '18 21:12 srowen

spark-sklearn spark-sklearn copied to clipboard

[WIP] Converts dataframe to/from named numpy arrays

spark-sklearn
spark-sklearn copied to clipboard