scalding
scalding copied to clipboard
add quotation to external APIs
This PR is the second part of https://github.com/twitter/scalding/pull/1754. It adds the Quoted
implicit param to the user-facing APIs. After this change, the last PR will use the projection information to do automatic pushdown.
@ttim @benpence @dieu @johnynek this is ready for review
This is both exciting, but a bit scary. This touches virtually every public API and make everything binary incompatible.
Yeah, it's an intrusive change. I thought it was clear that it'd be necessary when we discussed it, though.
In exchange for that, it is not yet clear what we are getting. I'd like to see maybe a quoted branch where we can use the final product with say Parquet to do some filter pushdown or projection pushdown.
If we don't have that as a killer app, I'm nervous about merging and getting to it later.
What do you think about making a quoted branch to make this PR into and when we have filter/projection pushdown working, we merge into develop?
That's https://github.com/twitter/scalding/pull/1754. These pull requests are just a break down of that initial one. Note that only projection pushdown is in the scope of this change.
I think right now the logic with passing quoted explicitly / implicitly (and Quoted.internal) is a bit fuzzy. Can we do instead something similar to cause in exceptions?
Let's make Quoted nested and basically putting quoted on method declaration means you want to pass information about this call site. And then if you have quoted parameter in the scope & apply quoted macro - put quoted from the context as .cause call site information.
@ttim I can't see the benefit of this proposal. The quotation propagation is just a way to make the initial information available in the final TypedPipe
composition. Could you give more details?
I don't think we really need Quoted.internal in this case and overall logic seems more clear to me (quoted parameter - want to save information about call site).
Quoted.internal
is just for safety since people might forget to propagate the quotation when making chanegs to Scalding. See the "Quoted propagation" section on https://github.com/twitter/scalding/pull/1754
To me, that could be done by only accepting Quoted where we take a lambda and only where we expect we could in principle do some filter pushdown or projection pushdown. Does that make sense?
Why else do we want the quoted?
@johnynek take a look at the description on https://github.com/twitter/scalding/pull/1754. We'd also like to use Quoted
to provide a better description of the job to users, which requires adding Quoted
to all user-facing APIs.
@johnynek I'm still working on updating source to develop without this change. I'll test these changes after it
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
Flavio Brasil seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.