kafka-go icon indicating copy to clipboard operation
kafka-go copied to clipboard

Is it possible to manually perform read by a specified offset?

Open KokorinIlya opened this issue 3 years ago • 3 comments

It seems to me that the current implementation lacks the ability to manually perform read request by a specified offset, like

ctx := context.Background()
conn := ...
topic := "my-topic"
partition := 0
offset := 42
msg, err := conn.ReadByOffset(ctx, topic, partition, offset)

Now I have to use kafka.Reader with the following code:

reader := kafka.NewReader(...)
reader.SetOffset(42)
msg := reader.FetchMessage(context.Background())

which requires two rounds of communication (one for updating offsets and one for fetching message) instead of one, as would be possible with the ReadByOffset capability.

KokorinIlya avatar Jul 31 '22 10:07 KokorinIlya

Hi @KokorinIlya, it looks to me like SetOffset does not make a remote API call and only makes changes in-memory. Could you clarify what you mean by "two rounds of communication"?

nlsun avatar Aug 05 '22 17:08 nlsun

@nlsun, consider two lines from the SetOffset implementation https://github.com/segmentio/kafka-go/blob/174188e6872cba89043c70b2d54e9b913c3ffacc/reader.go#L1032 https://github.com/segmentio/kafka-go/blob/174188e6872cba89043c70b2d54e9b913c3ffacc/reader.go#L1035

I believe, that these function calls lead to network I/O. Consider, for example, activateReadLag call:

This functions starts a goroutine https://github.com/segmentio/kafka-go/blob/174188e6872cba89043c70b2d54e9b913c3ffacc/reader.go#L1135 that calls another function ReadLag https://github.com/segmentio/kafka-go/blob/174188e6872cba89043c70b2d54e9b913c3ffacc/reader.go#L1146 that definitely involves some network calls https://github.com/segmentio/kafka-go/blob/174188e6872cba89043c70b2d54e9b913c3ffacc/reader.go#L935

KokorinIlya avatar Aug 05 '22 18:08 KokorinIlya

@KokorinIlya Oh, I was assuming you were talking about SetOffset being blocked on a network call.

Yes, r.start is starting a reader which makes network calls and r.activateReadLag is gathering lag metrics which makes network calls but both are in goroutines and SetOffset does not wait for them nor does it depend on them.

Were you looking to improve the performance of SetOffset? Or something else?

nlsun avatar Sep 02 '22 17:09 nlsun

Hi @KokorinIlya, I think Kafka doesn't offer a search like reading messages by specific offsets. Kafka is message streaming system that works on file systems. To give a better performance on disks, it writes and reads messages in a sequence one by one. Well let me know if Im wrong.

wonksing avatar Sep 29 '22 13:09 wonksing