scalding icon indicating copy to clipboard operation
scalding copied to clipboard

add quotation to external APIs

Open fwbrasil opened this issue 7 years ago • 6 comments

This PR is the second part of https://github.com/twitter/scalding/pull/1754. It adds the Quoted implicit param to the user-facing APIs. After this change, the last PR will use the projection information to do automatic pushdown.

fwbrasil avatar Jan 08 '18 18:01 fwbrasil

@ttim @benpence @dieu @johnynek this is ready for review

fwbrasil avatar Jan 08 '18 23:01 fwbrasil

This is both exciting, but a bit scary. This touches virtually every public API and make everything binary incompatible.

Yeah, it's an intrusive change. I thought it was clear that it'd be necessary when we discussed it, though.

In exchange for that, it is not yet clear what we are getting. I'd like to see maybe a quoted branch where we can use the final product with say Parquet to do some filter pushdown or projection pushdown.

If we don't have that as a killer app, I'm nervous about merging and getting to it later.

What do you think about making a quoted branch to make this PR into and when we have filter/projection pushdown working, we merge into develop?

That's https://github.com/twitter/scalding/pull/1754. These pull requests are just a break down of that initial one. Note that only projection pushdown is in the scope of this change.

fwbrasil avatar Jan 18 '18 06:01 fwbrasil

I think right now the logic with passing quoted explicitly / implicitly (and Quoted.internal) is a bit fuzzy. Can we do instead something similar to cause in exceptions?

Let's make Quoted nested and basically putting quoted on method declaration means you want to pass information about this call site. And then if you have quoted parameter in the scope & apply quoted macro - put quoted from the context as .cause call site information.

@ttim I can't see the benefit of this proposal. The quotation propagation is just a way to make the initial information available in the final TypedPipe composition. Could you give more details?

I don't think we really need Quoted.internal in this case and overall logic seems more clear to me (quoted parameter - want to save information about call site).

Quoted.internal is just for safety since people might forget to propagate the quotation when making chanegs to Scalding. See the "Quoted propagation" section on https://github.com/twitter/scalding/pull/1754

fwbrasil avatar Jan 18 '18 06:01 fwbrasil

To me, that could be done by only accepting Quoted where we take a lambda and only where we expect we could in principle do some filter pushdown or projection pushdown. Does that make sense?

Why else do we want the quoted?

@johnynek take a look at the description on https://github.com/twitter/scalding/pull/1754. We'd also like to use Quoted to provide a better description of the job to users, which requires adding Quoted to all user-facing APIs.

fwbrasil avatar Jan 19 '18 23:01 fwbrasil

@johnynek I'm still working on updating source to develop without this change. I'll test these changes after it

fwbrasil avatar Jan 25 '18 23:01 fwbrasil

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Flavio Brasil seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Nov 16 '19 00:11 CLAassistant