dagger icon indicating copy to clipboard operation
dagger copied to clipboard

feat: Parquet DataSource should provide ability to read multiple GCS buckets for creating multiple streams

Open Meghajit opened this issue 3 years ago • 1 comments

As part of this issue, want to add support for handling multiple streams for Parquet Data Source. That is, users should be able to specify multiple GCS URLs. Dagger should create a parquet data source, and hence a data stream for each of these GCS URLs.

This issue is needed so that the user can do joins and other operations with multiple streams on Parquet DataSource similar to KafkaSource.

Meghajit avatar Jan 20 '22 09:01 Meghajit

Removing this from Support for Parquet Files as a Source Milestone as it is a nice to have for the first milestone cc: @prakharmathur82

Meghajit avatar Jun 10 '22 05:06 Meghajit