scalding icon indicating copy to clipboard operation
scalding copied to clipboard

[prototype] Introduce `KeyGrouping` to make it possible to enable `OrderedSerialization` without code changes

Open ttim opened this issue 6 years ago • 2 comments

We're considering to enable OrderedSerialization for most users at Twitter. Currently we have a blocker for that - users needs to change a source code to enable it (and not only pass a parameter, which is much more beneficial for us because we have auto-tuning infrastructure to make incremental land of such things possible).

I've tried to prototype how we can workaround this - basic idea is to introduce KeyGrouping class which holds implicitly provided Ordering and generated OrderedSerialization. With this approach we can choose which one of them to use in runtime. Another pros of this approach:

  • users will not be needed to import OrderedSerialization if they want to use it - it would be provided by default
  • Scalding's key grouping will not mix with general orderings in the code - right now when you import OrderedSerialization on top level you can accidentally break some other ordering use cases.

Another thing which is nice about this approach is the fact it's source backward compatible.

This is just prototype (it's not even holds both OrderedSerialization and Ordering, I used it to gather Ordering's stats across our repo) but it illustrates the idea.

@johnynek what do you think?

ttim avatar Jun 02 '18 01:06 ttim

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: ttim
:x: Timur Abishev


Timur Abishev seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jun 02 '18 01:06 CLAassistant

Thinking back at this...

We could have an alternative approach: require OrderedSerialization everywhere we use Ordering, but have a low priority implicit that uses Kryo to supply the implicit if you have an ordering.

johnynek avatar Aug 20 '18 06:08 johnynek