frameless icon indicating copy to clipboard operation
frameless copied to clipboard

Add a typed `col` function for creating column references

Open iravid opened this issue 6 years ago • 9 comments

Resolves #186.

Would be a good idea to wait for #110 before merging this to avoid conflict on SelectTests.scala.

iravid avatar Sep 22 '17 09:09 iravid

Also refs #164

iravid avatar Sep 22 '17 09:09 iravid

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@048d06c). Click here to learn what that means. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #187   +/-   ##
=========================================
  Coverage          ?   94.95%           
=========================================
  Files             ?       35           
  Lines             ?      674           
  Branches          ?       11           
=========================================
  Hits              ?      640           
  Misses            ?       34           
  Partials          ?        0
Impacted Files Coverage Δ
...ss/functions/PartiallyConstructedTypedColumn.scala 100% <100%> (ø)
...t/src/main/scala/frameless/functions/package.scala 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 048d06c...897e499. Read the comment docs.

codecov-io avatar Sep 22 '17 09:09 codecov-io

@OlivierBlanvillain have another look; this approach should allow to keep the column reference in a value, for example, and then have the actual TypedColumn constructed when using select.

iravid avatar Sep 22 '17 14:09 iravid

Ah, but this approach doesn't support expressions like val c = functions.col('a) + 1. Tradeoffs, tradeoffs...

iravid avatar Sep 22 '17 14:09 iravid

What's the reasoning behind carrying around the dataset type in the typed columns?

iravid avatar Sep 22 '17 14:09 iravid

If it was just expr: TypedColumn[Boolean] we could't statically check that ds.select(expr) does not uses some columns that are not defined on ds. #162 has an interesting discussion with an alternative designed based around an implicit that would be the perfect solution if we had implicit function types.

OlivierBlanvillain avatar Sep 22 '17 14:09 OlivierBlanvillain

That's true, but how about moving the column existence evidence to the methods on TypedDataset?

The reasoning is that a TypedColumn can freely exist outside the context of a typed dataset, e.g. as a literal expression. The evidence is only needed when using it as a projection on a dataset.

iravid avatar Sep 22 '17 15:09 iravid

We didn't consider that, but it sounds isomorphic given than we can already do something like def a(ds)(implicits) = ds('a)... Also I'm not sure it would work out with the requirements of #175.

OlivierBlanvillain avatar Sep 22 '17 15:09 OlivierBlanvillain

It is isomorphic up to the requirement of having ds at hand ;)

I will open a separate PR to experiment with moving the evidence to ds.select et al and we can decide separately if this PR is good enough to be useful given its limited inference.

iravid avatar Sep 22 '17 15:09 iravid