doric icon indicating copy to clipboard operation
doric copied to clipboard

v1 milestones & release

Open MrPowers opened this issue 3 years ago • 5 comments

I'm really excited about this project!

Think about the features that'll be included in the "initial public release". Once all the initial features are built, ping me, and I'll make a commit to make a compelling sell in the project README.

Once the README is updated, I'll start marketing the project to try to get users and feedback on the code.

Sounds like a good plan? I'm definitely interested in seeing this project grow & get a lot of users!

MrPowers avatar Apr 03 '21 22:04 MrPowers

Hey @MrPowers, im very happy for your interest 😄

Im still refactoring the code to make a first usable version. I spect to have all types except structs included, and the idea is to have basic functionality for map functions (withColumn, filter, select drop etc) but typed.

The idea now is to make a syntax very close to the spark API, an example of would be something like:

df.withColumn("new_col", getInt("c1") + getInt("c2"))
df.withColumn("new_col", getInt("c1") + getTimestamp("c2")) //wont compile

any error in runtime will be accumulated, so if c1 and c2 are not integer, it will be throwed in a single error saying that both columns selected are invalid.

I will try to have some basic functionality in the following days to show you.

alfonsorr avatar Apr 05 '21 12:04 alfonsorr

@alfonsorr - that sounds like a good first implementation. I like the idea of making this lib a "minimalistic, performant way to write typesafe Spark code". It can have these selling points:

  • it allows for typesafe programming with compile-time checks
  • it's just as performant as regular Spark DataFrames (unlike Datasets)
  • it can be used in conjunction with "regular Spark code"

Bringing the benefits of typesafe programming to the Spark-Scala community will be a huge benefit!

Let me know when you're finished with the basic prototype and I'll try it out. Not rush. Definitely excited!

MrPowers avatar Apr 05 '21 15:04 MrPowers

Awesome selling points :)

My only possible caveat is that the message sounds too strong. I mean, DataFrames are dynamically typed, and this won't be avoided by doric expressions: compile-time checks may succeed and we may still get typing errors at runtime, right? Things might be different if we could start from some kind of ValidatedDataFrame[T]. In that case, dynamic typing errors could also happen, of course, but they would be captured in advance. We may then say that execution is guaranteed to be successful provided that the validation checks on the accompanying DataFrame succeed. Not sure at all if this kind of ValidatedDataFrame is useful at all, though. Maybe, it would be enough to constrain the scope of type-safety in a footnote to well-formed column Spark expressions or something like that, leaving your selling points intact.

Thanks for your involvement, @MrPowers!

jserranohidalgo avatar Apr 06 '21 11:04 jserranohidalgo

I've opened a few issues with elements pending for a first release and created project in github to keep track of them.

alfonsorr avatar Apr 15 '21 21:04 alfonsorr

@alfonsorr - I checked the issues and the project and it looks like you're making great progress. Ping me when the v1 stuff is done, so I can try out the project and provide feedback. Can't wait!!

MrPowers avatar Apr 16 '21 11:04 MrPowers