console icon indicating copy to clipboard operation
console copied to clipboard

Find messages by message key (reduce lookup to one partition)

Open thimmwork opened this issue 5 years ago • 3 comments

I often search for the latest message (or several), given one known message key. I do not know the offset, so I kind of "hope" to find it within the last 500 messages on the topic, but this is often not the case, because K-Owl limits the result to 500 messages across all partitions, and I have quite some throughput on the topic. However, I do know that by limiting the search to only one partition, I have a much higher chance of finding what I'm looking for.

So I would like K-Owl to support my multi-step workflow, which is like this:

  • count the number of partitions for my topic
  • determine which partition the message is on by calculating murmur2 hash for the known message key modulo number of partitions
  • limit the search to that partition and usually find what I'm looking for.

Of course this will only work if producers use the DefaultPartitioner or default hash (murmur2) to decide which partition to write to.

thimmwork avatar Jul 24 '20 15:07 thimmwork

I do not know the offset, so I kind of "hope" to find it within the last 500 messages on the topic, but this is often not the case,

You no longer have to hope when you are going to use the new version which contains https://github.com/cloudhut/kowl/pull/54 .

What you suggest could be a signifcant performance improvement if you search for specific keys though. Though the main challengeswith that are:

  1. We use a JavaScript interpreter and we literally don't know anything about the JavaScript you execute within the interpreter VM. All we care about is whether this function returns false (skip message) or true (send message to user).

  2. You already mentioned this will only work with the default partitioning algorithm. For those who don't use the default partitioner we would need to introduce an option to "disable this assumption".

In short: I believe your described usecase is already solved in a far more superior/flexible way using the streaming search. If there's a different usecase than searching for specific messages by key we could obviously revisit this and easily introduce a feature that "calculates" the target partition id for a given key?

weeco avatar Jul 24 '20 15:07 weeco

A feature to compute the partition for a given key could be put into the Partitions tab.

rikimaru0345 avatar Aug 03 '20 20:08 rikimaru0345

Additional hint:

The kafka libraries for different languages partially use different default partitioning algorithms. Some use the murmur2 hash, others CRC32, ...

weeco avatar Aug 15 '20 12:08 weeco

[....] easily introduce a feature that "calculates" the target partition id for a given key?

+1 for adding this somewhere.

A simple way to find the right partition for a certain key (in a given topic) in the UI would be very helpful. This would indeed be nice to have in the Partitions tab (perhaps as filter for the partitions list?), along with an info-note to remind users that the actual mapping still depends on the producer configuration.

And perhaps even with a link back to the Messages tab (for any partition in the Partitions tab), making the partition drop-down pre-selected with such a chosen partition.

s-rwe avatar Apr 21 '23 07:04 s-rwe

Fast enough to search within a specific partition once you know it, but it's very complicated to add a feature to the frontend (as in: how partitioning works, what partitioner is used, etc) to try to calculate the partition you want to search. Closing for now.

twmb avatar Oct 19 '23 14:10 twmb