Grant Nicholas

Results 14 comments of Grant Nicholas

Thanks for the fast turnaround! I will try building that branch 👍

+1 on this. The only other service out there that can dump data from kafka -> s3 and create hive tables/partitions is https://github.com/pinterest/secor, and it is not maintained by pinterest...

> > Emitting an event to a topic indicating a new partition has been created/finalized would go a long way > > Not perfect, but sounds like you want this...

Not saying it is impossible to do (our company is building our own system in a similar but slightly different fashion) just that it feels like a common problem and...

I think @teabot is suggesting is for ETL processes (not for querying), you need to know when the partition is finalized so you don't start processing data too soon and...

> We also do this (much less efficiently) with Hive by using a copy-on-write approach when updating partitions. You don't need to do copy on write with this s3 sink...

@jalaziz did you ever open a PR for this? We ran into the exact same issue and noticed it might be a great chance to contribute back. Specifically this bit:...

> @grantatspothero do we still want this PR? I thought you worked around it in another way We discussed over trino slack. For others: this race still occurs but if...

@Praveen2112 @findepi @anusudarsan mind taking another look? Stress tests are green.