gobblin icon indicating copy to clipboard operation
gobblin copied to clipboard

Gobblin 1320 add iceberg writer

Open hanghangliu opened this issue 4 years ago • 5 comments

Dear Gobblin maintainers,

This pr is a still ongoing development. Any comment and advise is welcomed!

JIRA

  • [ ] My PR addresses the following Gobblin JIRA issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
    • https://issues.apache.org/jira/browse/GOBBLIN-1320

Description

  • [ ] Here are some details about my PR, including screenshots (if applicable): This PR is trying to add an Iceberg module, that enable Gobblin to write as Iceberg table format. It wraps Iceberg task writer into FsDataWriter. It accept Avro, ORC and Parquet format. This PR only addresses the writer part of the Iceberg module, and the source part will be in another PR.

Tests

  • [ ] My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

hanghangliu avatar Dec 23 '20 14:12 hanghangliu

Codecov Report

Merging #3184 (579aadc) into master (fb5e40f) will increase coverage by 0.24%. The diff coverage is 29.62%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3184      +/-   ##
============================================
+ Coverage     45.94%   46.18%   +0.24%     
- Complexity     9602     9762     +160     
============================================
  Files          1997     2019      +22     
  Lines         76097    77180    +1083     
  Branches       8469     8563      +94     
============================================
+ Hits          34960    35648     +688     
- Misses        37873    38226     +353     
- Partials       3264     3306      +42     
Impacted Files Coverage Δ Complexity Δ
...pache.gobblin/writer/IcebergDataWriterBuilder.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...in/java/org.apache.gobblin/writer/IcebergUtil.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
.../java/org.apache.gobblin/writer/IcebergWriter.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...che.gobblin/writer/IcebergFileAppenderFactory.java 32.25% <32.25%> (ø) 2.00 <2.00> (?)
...pache.gobblin/writer/IcebergTaskWriterFactory.java 87.50% <87.50%> (ø) 3.00 <3.00> (?)
...apache/gobblin/runtime/api/JobCatalogListener.java 76.92% <0.00%> (-23.08%) 0.00% <0.00%> (ø%)
...n/runtime/job_catalog/JobCatalogListenersList.java 63.63% <0.00%> (-10.05%) 10.00% <0.00%> (ø%)
...pache/gobblin/cluster/JobConfigurationManager.java 81.39% <0.00%> (-6.11%) 10.00% <0.00%> (ø%)
.../apache/gobblin/runtime/api/MutableJobCatalog.java 81.25% <0.00%> (-5.42%) 0.00% <0.00%> (ø%)
...ache/gobblin/cluster/GobblinHelixJobScheduler.java 34.48% <0.00%> (-4.74%) 6.00% <0.00%> (ø%)
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fb5e40f...579aadc. Read the comment docs.

codecov-io avatar Dec 23 '20 15:12 codecov-io

Also, there's a travis failure

autumnust avatar Jan 21 '21 01:01 autumnust

@autumnust, @sv2000 , any plan on continuing on this PR ?

jhsenjaliya avatar Jun 29 '21 08:06 jhsenjaliya

@jhsenjaliya - It would be good to revive this PR. Are you interested in taking it up? Happy to discuss, how this PR can be improved.

sv2000 avatar Jun 29 '21 16:06 sv2000

sure,lets talk about it next week.

jhsenjaliya avatar Jul 05 '21 21:07 jhsenjaliya