console icon indicating copy to clipboard operation
console copied to clipboard

Support regex in topic protobuf mappings

Open upils opened this issue 2 years ago • 17 comments

Fixing #405

This is still a work in progress (need more manual and unit tests) but If you find time you can validate you are OK with the direction I am heading.

I have concerns about performance if every message end up matching with the last regex in a long list. I may add a benchmark to get an idea of the difference between regex matching vs directly matching in the map (I expect a huge difference).

upils avatar Dec 21 '22 22:12 upils

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Dec 21 '22 22:12 CLAassistant

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Dec 21 '22 22:12 CLAassistant

I fixed merge conflicts. @weeco / @rikimaru0345 can you take a look on this?

upils avatar Jan 09 '23 20:01 upils

Hey @upils , I share your concerns in regards to the performance. This is code executed in the hot-path if you search a topic's messages with search filters enabled. It wouldn't be uncommon to process ~100k records / seconds. I think there are solutions to this problem - such as caching the results for requested types etc.

weeco avatar Jan 11 '23 15:01 weeco

@weeco thanks for your answer. I have now added matching topics in the strictMappingsByTopic (see https://github.com/redpanda-data/console/pull/584/files#diff-cb0859875b7a3634d6b559a35d024fc14ba558a7960fabbd6287d716423246a7R296) so a given topic never have to be searched twice in the regexMappingsByTopic.

I wrote a benchmark with a variable proportion of strict/regex mappings.

Before the fix, performances were degrading following the ratio of regex mappings:

goos: linux
goarch: amd64
pkg: github.com/redpanda-data/console/backend/pkg/proto
cpu: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
BenchmarkService_getMatchingMapping
BenchmarkService_getMatchingMapping/Only_strict_mappings
BenchmarkService_getMatchingMapping/Only_strict_mappings-8                  5998            186877 ns/op               8 B/op          0 allocs/op
BenchmarkService_getMatchingMapping/10%_regex_mappings
BenchmarkService_getMatchingMapping/10%_regex_mappings-8                     249           4464246 ns/op             451 B/op          2 allocs/op
BenchmarkService_getMatchingMapping/50%_regex_mappings
BenchmarkService_getMatchingMapping/50%_regex_mappings-8                      91          12583620 ns/op            2226 B/op         15 allocs/op
BenchmarkService_getMatchingMapping/90%_regex_mappings
BenchmarkService_getMatchingMapping/90%_regex_mappings-8                      58          17922738 ns/op            4674 B/op         41 allocs/op
PASS
ok      github.com/redpanda-data/console/backend/pkg/proto      5.692s

After the fix, it shows no significant difference:

goos: linux
goarch: amd64
pkg: github.com/redpanda-data/console/backend/pkg/proto
cpu: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
BenchmarkService_getMatchingMapping
BenchmarkService_getMatchingMapping/Only_strict_mappings
BenchmarkService_getMatchingMapping/Only_strict_mappings-8                  5724            180517 ns/op               8 B/op          0 allocs/op
BenchmarkService_getMatchingMapping/10%_regex_mappings
BenchmarkService_getMatchingMapping/10%_regex_mappings-8                    7711            150304 ns/op              12 B/op          0 allocs/op
BenchmarkService_getMatchingMapping/50%_regex_mappings
BenchmarkService_getMatchingMapping/50%_regex_mappings-8                    7566            147093 ns/op              30 B/op          0 allocs/op
BenchmarkService_getMatchingMapping/90%_regex_mappings
BenchmarkService_getMatchingMapping/90%_regex_mappings-8                    6926            157498 ns/op              42 B/op          0 allocs/op
PASS
ok      github.com/redpanda-data/console/backend/pkg/proto      6.286s

I may add more use cases to the benchmark if I think of some corner cases.

upils avatar Jan 11 '23 21:01 upils

@upils , @weeco can we apprise this topic one final time? It's almost done, guys! I left my comment in the review on how to fix the complexity and keep the steam inside. Can you help me?

smatvienko-tb avatar Aug 14 '23 17:08 smatvienko-tb

@upils were you able to sign the Contributor License Agreement? I honestly can't tell from the bot 😅 Looks like once that is sorted and you get an approval, this PR is ready for prime time!

brycefisher avatar Aug 14 '23 22:08 brycefisher

@upils were you able to sign the Contributor License Agreement? I honestly can't tell from the bot 😅 Looks like once that is sorted and you get an approval, this PR is ready for prime time!

Yes, but I guess I messed up with the mail address I used in my commits. I will rewrite them and it should be good.

upils avatar Aug 16 '23 07:08 upils

raising it up

smatvienko-tb avatar Sep 12 '23 17:09 smatvienko-tb

@upils ping on rewriting the commit emails.

twmb avatar Sep 19 '23 14:09 twmb

Hey @smatvienko-tb @twmb @brycefisher . Sorry for this long hiatus. I fixed the commit author and started to apply the changes suggested by @smatvienko-tb . I need to add test cases but let me know if this is what you had in mind.

upils avatar Feb 17 '24 08:02 upils

I left two comments. Overall I'd like to suggest to change this PR so that regex strings are recognized if the input is wrapped into two slashes (see my other PR comment).

We can implement a custom type RegexpOrLiteral which implements the Stringer method and has a custom Unmarshaler, so that you can provide mappings like this:

mappings:
  - topicName: normal-exact-match
    keyProtoType: "foo"
  - topicName: /some-topic-prefix-matc.*/
    keyProtoType: "bar"

You can do so by creating a type such as:

type RegexpOrLiteral struct {
	literal string
	*regexp.Regexp
}

func (r *RegexpOrLiteral) String() string {
	if r.literal != "" {
		return r.literal
	}
	if r.Regexp != nil {
		return r.Regexp.String()
	}
	return ""
}

// UnmarshalText unmarshals json into a regexp.Regexp
func (r *RegexpOrLiteral) UnmarshalText(b []byte) error {
	regex, err := compileRegex(string(b))
	if err == nil {
		r.Regexp = regex
	}
	r.literal = string(b)
	return nil
}

// MarshalText marshals regexp.Regexp as string
func (r *RegexpOrLiteral) MarshalText() ([]byte, error) {
	if r.Regexp != nil {
		return []byte(r.Regexp.String()), nil
	}

	return []byte(r.literal), nil
}

Additionally koanf (our config management lib) needs to be configured so that it will try the TextUnmarshalling, by adding the mapstructure.TextUnmarshallerHookFunc(), option here: https://github.com/redpanda-data/console/blob/94ef19bcb9c8202040b3f8bf93ee1ead4fe94ae8/backend/pkg/config/config.go#L133

The compileRegex method already exists here and checks for the wrapped slashes: https://github.com/redpanda-data/console/blob/94ef19bcb9c8202040b3f8bf93ee1ead4fe94ae8/backend/pkg/config/util_regex.go#L19-L36 . By doing so you have the best of both worlds:

  • Ability to provide regexes while being backwards compatible (a single string can be provided for the topic name)
  • Regexes are compiled at unmarshal time so that we error early and users should clearly know whether regex is used or not

What do you think @upils @smatvienko-tb ?

weeco avatar Feb 22 '24 23:02 weeco

Hi @upils and @weeco! Thanks so much for all the hardwork on implementing and reviewing this feature so far. Is there anything I can do to help push this feature over the line?

brycefisher avatar Mar 18 '24 18:03 brycefisher

@brycefisher if you'd like to take this on, would you be able to fork and address suggestions from @weeco just above?

twmb avatar Mar 25 '24 15:03 twmb

Yes! Happy to put the finishing touches on this.

brycefisher avatar Mar 26 '24 00:03 brycefisher

@upils @smatvienko-tb @brycefisher ping

weeco avatar Apr 08 '24 12:04 weeco

We'll move this to unscheduled on our side, but please ping once y'all are able to uptake this and we can re-review.

twmb avatar May 06 '24 16:05 twmb

This should be addressed now via #1291.

bojand avatar Jun 19 '24 13:06 bojand