Vinish Reddy

Results 11 issues of Vinish Reddy

### Change Logs Fix bug in checkpointing logic for S3/GCS in empty dataset use-case. The reason for the bug was following. 1st delta commit's checkpoint, processed 3 files. ``` 23/12/06...

### Change Logs There were test failures in the original PR and had to be reverted, bringing back the change and fixed the tests now. https://github.com/apache/hudi/pull/10687 Introducing a new class...

### Change Logs Use SourceProfile in S3/GCS sources for the sourceLimit and numPartitions. Instead of relying on the sourceLimit passed in from static config use the one provided by the...

release-0.15.0
size:L

### Change Logs Add a new config `HOODIE_SPARK_DATASOURCE_OPTIONS` which is used by the spark dataframe reader for HoodieIncrSource, options like using metadataTable, dataSkipping present inDataSourceOptions.scala can be passed for efficient...

release-0.15.0
size:S

### Change Logs This block of code is problematic and can lead to OOM when we are we converting the iterator into a list and then returning the iterator back....

release-0.15.0
size:S

### Change Logs Previous PR -> https://github.com/apache/hudi/pull/10861 Publish metrics for source parallelism for Kafka, S3/GCS sources. ### Impact No impact, only change in metrics. ### Risk level (write none, low...

release-0.15.0
size:L

### Change Logs NOTE: This PR handles only AVRO code paths, there will be follow-up patch for RowWriter code paths as well. There are two problems with BULK_INSERT and partitioners....

release-0.15.0
size:M

### Change Logs Similar to `KafkaSource`, the source profile populated in `StreamContext` can be used for better parallelism and instead of using a static value for numInstantsPerFetch, a dynamic value...

size:M

### Change Logs Follow-up PR for https://github.com/apache/hudi/pull/11159 Try to fetch the latestSourceProfile always, this ensures the profile is refreshed if it's no longer valid. The implementation of source profile takes...

size:M

## *Important Read* - *Please ensure the GitHub issue is mentioned at the beginning of the PR* ## What is the purpose of the pull request Add the release guide/process...