hudi
hudi copied to clipboard
Upserts, Deletes And Incremental Processing on Big Data.
currently, we use now() - splitLatestCommit, however, when the time goes and the task just processes a huge data commit, then the diff between now and splitLatestCommit may get larger....
https://github.com/apache/hudi/pull/12781/files#r1964205520 A new constructor is added. We should see if this is really needed (rewrite the tests so this is not needed?) and keep the constructors simple, by removing this...
[https://github.com/apache/hudi/pull/12105/files#r1815875535] We need to move [this logic|https://github.com/apache/hudi/blob/a7512a206c5a1e8ce251cac7a302632a57d8c848/hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java#L855-L858] inside `HoodieMetadataPayload.combineSecondaryIndexRecord`, and need to override `MetadataPartitionType.combineMetadataPayloads` for secondary index. While updating secondary index, merging logic should follow same logic as readFromBaseAndMergeWithLogRecords API...
For CUSTOM merge mode, the list of record merging implementation classes is required for the record merging to work. Persisting it to the table config makes it easier for query...
Flush out direction for end to end writes using Row or InternalRow ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-9035 - Type: Sub-task - Parent: https://issues.apache.org/jira/browse/HUDI-9019 - Fix version(s): - 1.1.0
If the user wants to migrate from using the payload class to the merger implementation class, the merger strategy ID needs to be changed, and other record merge configs need...
PAYLOAD_CLASS_NAME ("hoodie.compaction.payload.class") is defined in both HoodiePayloadConfig and HoodieTableConfig. They are used in different places. We should keep one of them only to avoid confusion. ## JIRA info - Link:...
We need to ensure that we cover the following cases for basic col stats certification: # insert few records validate. update the same and validate updates are reflected. repeat the...
While working towards making partition stats default, we ran into an issue, with Byte data type [https://github.com/apache/hudi/pull/12671] min max values when merging multiple values did not align w/ manually computed...
Related to - https://issues.apache.org/jira/browse/HUDI-8275 Currently, we are not using the new filegroup reader for bootstrap splits. We need to fix that. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-8380 - Type: Sub-task...