flint
flint copied to clipboard
pyspark 2.4 support
Does this library currently work with spark 2.4?
We have not tried it with Spark 2.4 yet. On Wed, Feb 13, 2019 at 3:44 PM mattomatic [email protected] wrote:
Does this library currently work with spark 2.4?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/twosigma/flint/issues/63, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwbrHasxpH0N4PydNjQOOvfgBQU4YFJks5vNHkagaJpZM4a6PJ7 .
It does not work with 2.3.2 as well
what issues do you see with 2.3.2? Internally we use flint with 2.3.2 without issues. On Fri, Feb 15, 2019 at 1:53 AM Sandro Cavallari [email protected] wrote:
It does not work wit 2.3.2 as well
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/twosigma/flint/issues/63#issuecomment-463929000, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwbrIZKDz7cGE8o1d7di37jFP3jI6yCks5vNlmFgaJpZM4a6PJ7 .
https://github.com/twosigma/flint/pull/64
Managed these changes to get it to build under spark 2.4
I am running on spark-2.4.0-bin-hadoop2.7 and seeing the below error. Any idea ?
from ts.flint import windows sp500_previous_day_return = sp500_return.shiftTime(windows.future_absolute_time('1day')).toDF('time', 'previous_day_return') Traceback (most recent call last): File "
", line 1, in File "/Users/kkum25/anaconda/envs/featuretool/lib/python3.7/site-packages/ts/flint/dataframe.py", line 1591, in shiftTime tsrdd = self.timeSeriesRDD.shift(window._jwindow(self._sc)) File "/Users/kkum25/anaconda/envs/featuretool/lib/python3.7/site-packages/ts/flint/dataframe.py", line 154, in timeSeriesRDD self._jdf, self._is_sorted, self._junit, self._time_column) File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/usr/local/Cellar/apache-spark/2.2.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o124.fromDF. : java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.plans.physical.ClusteredDistribution$.apply$default$2()Lscala/Option; at com.twosigma.flint.timeseries.TimeSeriesStore$.isClustered(TimeSeriesStore.scala:149) at com.twosigma.flint.timeseries.TimeSeriesStore$.apply(TimeSeriesStore.scala:64) at com.twosigma.flint.timeseries.TimeSeriesRDD$.fromDFWithPartInfo(TimeSeriesRDD.scala:509) at com.twosigma.flint.timeseries.TimeSeriesRDD$.fromDF(TimeSeriesRDD.scala:304) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745)
#64
Managed these changes to get it to build under spark 2.4
before this change what issue you were seeing ?
Is there any blockers to merging this in? I'd like to use Flint on Databricks but I don't see any compatible versions of Spark being offered (2.2.1 or 2.4.[0,1,2] are only versions of Spark currently available).
I think this is a great project and would love to help mature it!
I have been successfully using @mattomatic's changes to run on Spark 2.4.
@icexelloss Does this library work with spark 2.2.x version?