gobblin icon indicating copy to clipboard operation
gobblin copied to clipboard

[GOBBLIN-272] Remove hardwired avro extensions from compaction and make it generic

Open treff7es opened this issue 7 years ago • 7 comments

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

  • [ ] My PR addresses the following Gobblin JIRA issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
    • https://issues.apache.org/jira/browse/GOBBLIN-272

Description

Avro extension was hardcoded in some place at Compaction and I just made it configurable to be more generic. You can set it in your compaction config by: compaction.extension=gz

It seems like the code style on these file was not the Gobblin style and that's why the change seems to be huge (or am I using the wrong codestyle?).

Tests

  • [ ] My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

treff7es avatar Oct 02 '17 15:10 treff7es

@treff7es this only handles extensions other than avro right? It seems that it would always use the AvroInputFormat either way?

ibuenros avatar Oct 03 '17 15:10 ibuenros

@ibuenros If you meant that currently compaction only works with avro then you are right or maybe I misunderstood you :). On it's own it does not make much sense generalise if compaction only works with avro. Why I did this because if somebody want to create compaction job which can work with other formats then he/she should need to take care override all of these implementations. Like here we are compressing gzip compressed json logs (here gzip is hardwired which I wold like to remove and generalize later) : https://github.com/prezi/gobblin/tree/json_module/gobblin-modules/gobblin-json/src/main/java/org/apache/gobblin/mapreduce . Did I get right what you were saying? Is it the way I should do it?

treff7es avatar Oct 03 '17 21:10 treff7es

@yukuai518 @ibuenros I see that now this is conflicting with the master and I'm happy to fix it if you see the value in it. I know @ibuenros had some comment and I'm not sure if you were ok with my reply. What do you think? Should I resolve the conflict?

treff7es avatar Nov 30 '17 07:11 treff7es

@ibuenros ^^

abti avatar Mar 23 '18 18:03 abti

ping @ibuenros ^^

abti avatar May 04 '18 18:05 abti

@ibuenros ^^

abti avatar Jun 14 '18 23:06 abti

@treff7es comment makes sense to me. I am ok with the general idea of the changes, although I don't remember the details of the change. If someone reviews it I have no problem with this being committed.

ibuenros avatar Jun 14 '18 23:06 ibuenros