Grant Nicholas
Grant Nicholas
Thanks for the fast turnaround! I will try building that branch 👍
This can be closed btw, #377 fixed the issue thanks!
+1 on this. The only other service out there that can dump data from kafka -> s3 and create hive tables/partitions is https://github.com/pinterest/secor, and it is not maintained by pinterest...
> > Emitting an event to a topic indicating a new partition has been created/finalized would go a long way > > Not perfect, but sounds like you want this...
Not saying it is impossible to do (our company is building our own system in a similar but slightly different fashion) just that it feels like a common problem and...
I think @teabot is suggesting is for ETL processes (not for querying), you need to know when the partition is finalized so you don't start processing data too soon and...
> We also do this (much less efficiently) with Hive by using a copy-on-write approach when updating partitions. You don't need to do copy on write with this s3 sink...
@jalaziz did you ever open a PR for this? We ran into the exact same issue and noticed it might be a great chance to contribute back. Specifically this bit:...
> @grantatspothero do we still want this PR? I thought you worked around it in another way We discussed over trino slack. For others: this race still occurs but if...
@Praveen2112 @findepi @anusudarsan mind taking another look? Stress tests are green.