gobblin
gobblin copied to clipboard
GOBBLIN-759: Added feature to support DistCP to copy files that were …
…modified in last n days
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
- [x] My PR addresses the following Gobblin JIRA issues and references them in the PR title. For example, "[GOBBLIN-759] My Added feature to support DistCP to copy files modified in last n days"
- https://issues.apache.org/jira/browse/GOBBLIN-759
Description
- [x] Here are some details about my PR, including screenshots (if applicable):
- Added feature to DistCP the files which were modified in last n days within the lookback period.
- This feature allows to copy only the modified files even when non modified files not at the destination.
- Leverage existing TimestampBasedCopyableDataset to find the dataset and uses SelectBtwModDataTimeBasedCopyableFileFilter CopyableFilter implementation to filter the files that were modified in last n days.
Tests
- [x] My PR adds the following unit tests OR does not need testing for this extremely good reason:
- Added TimestampBasedCopyableDatasetTest.testCopyWithFilter test case to test 1 modified and 1 non-modified scenario.
Commits
- [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
- Subject is separated from body by a blank line
- Subject is limited to 50 characters
- Subject does not end with a period
- Subject uses the imperative mood ("add", not "adding")
- Body wraps at 72 characters
- Body explains "what" and "why", not "how"
@sv2000 @htran1 @jhsenjaliya created New PR. Please review
will continue review tomorrow....
@jhsenjaliya Pushed the changes, please review
Codecov Report
:exclamation: No coverage uploaded for pull request base (
master@bca2e1f). Click here to learn what that means. The diff coverage is0%.
@@ Coverage Diff @@
## master #2633 +/- ##
========================================
Coverage ? 4.13%
Complexity ? 751
========================================
Files ? 1937
Lines ? 72988
Branches ? 8051
========================================
Hits ? 3017
Misses ? 69652
Partials ? 319
| Impacted Files | Coverage Δ | Complexity Δ | |
|---|---|---|---|
| ...sion/finder/HdfsModifiedTimeHiveVersionFinder.java | 23.07% <ø> (ø) |
1 <0> (?) |
|
| ...writer/partitioner/TimeBasedWriterPartitioner.java | 0% <ø> (ø) |
0 <0> (?) |
|
| ...he/gobblin/cluster/TaskRunnerSuiteThreadModel.java | 0% <ø> (ø) |
0 <0> (?) |
|
| .../java/org/apache/gobblin/hive/HiveLockFactory.java | 0% <ø> (ø) |
0 <0> (?) |
|
| ...lin/hive/metastore/HiveMetaStoreBasedRegister.java | 0% <ø> (ø) |
0 <0> (?) |
|
| ...pache/gobblin/configuration/ConfigurationKeys.java | 0% <ø> (ø) |
0 <0> (?) |
|
| .../org/apache/gobblin/hive/HiveRegistrationUnit.java | 0% <ø> (ø) |
0 <0> (?) |
|
| .../org/apache/gobblin/service/ServiceConfigKeys.java | 0% <ø> (ø) |
0 <0> (?) |
|
| ...ain/java/org/apache/gobblin/writer/DataWriter.java | 0% <ø> (ø) |
0 <0> (?) |
|
| ...ain/java/org/apache/gobblin/hive/HiveLockImpl.java | 0% <ø> (ø) |
0 <0> (?) |
|
| ... and 129 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update bca2e1f...c3dc277. Read the comment docs.
@sv2000 Please review
+1 LGTM