secor icon indicating copy to clipboard operation
secor copied to clipboard

Educational purpose: Implementation question

Open sfc-gh-japatel opened this issue 4 years ago • 3 comments

Hello there, This question is just for implementation purpose. In readme file, it is written that as long as Kafka is not dropping messages (e.g., due to aggressive cleanup policy) before Secor is able to read them, it is guaranteed that each message will be saved in exactly one S3 file. This property is not compromised by the notorious temporal inconsistency of S3 caused by the eventual consistency model,

Although, doesn't it also depend on what is the underlining implementation for uploadManager? Does Hadoops3uploadmanager provide strong consistency?

Thanks

sfc-gh-japatel avatar Oct 16 '20 17:10 sfc-gh-japatel

Exact-once upload has not much to do with which upload manager, it depends on the following things:

  1. The file name generation is deterministic: kafka-partition-number + begin-kafka-offset
  2. S3 upload first then commit kafka consumer offset
  3. In the case when S3 upload succeeds but kafka consumer offset commit fails, the next secor worker will continue working on this partition and re-upload the whole thing again (starting with the same begin_offset since that offset was not committed to kafka), the file would still be named the same and it will overwrite the the existing file on S3

On Fri, Oct 16, 2020 at 10:18 AM Jay Patel [email protected] wrote:

Hello there, This question is just for implementation purpose. In readme file, it is written that as long as Kafka is not dropping messages (e.g., due to aggressive cleanup policy) before Secor is able to read them, it is guaranteed that each message will be saved in exactly one S3 file. This property is not compromised by the notorious temporal inconsistency of S3 caused by the eventual consistency model,

Although, doesn't it also depend on what is the underlining implementation for uploadManager? Does Hadoops3uploadmanager provide strong consistency?

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pinterest/secor/issues/1641, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYJP77VUZAKB4NVDWSCQ63SLB55VANCNFSM4STTGFUA .

HenryCaiHaiying avatar Oct 19 '20 01:10 HenryCaiHaiying

Thanks Henry, does any of this operation do a get before put? Looks like if you are just replacing the file in 3rd step then there might not be a need for get.

sfc-gh-japatel avatar Oct 19 '20 18:10 sfc-gh-japatel

There is no get, it's a file replacing operation on S3.

On Mon, Oct 19, 2020 at 11:31 AM Jay Patel [email protected] wrote:

Thanks Henry, does any of this operation do a get before put? Looks like if you are just replacing the file in 3rd step then there might not be a need for get.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pinterest/secor/issues/1641#issuecomment-712362929, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYJP75HMHWI3ZQ7A5E7Q2DSLSAXLANCNFSM4STTGFUA .

HenryCaiHaiying avatar Oct 19 '20 18:10 HenryCaiHaiying