console
console copied to clipboard
Support regex in topic protobuf mappings
Fixing #405
This is still a work in progress (need more manual and unit tests) but If you find time you can validate you are OK with the direction I am heading.
I have concerns about performance if every message end up matching with the last regex in a long list. I may add a benchmark to get an idea of the difference between regex matching vs directly matching in the map (I expect a huge difference).
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
I fixed merge conflicts. @weeco / @rikimaru0345 can you take a look on this?
Hey @upils , I share your concerns in regards to the performance. This is code executed in the hot-path if you search a topic's messages with search filters enabled. It wouldn't be uncommon to process ~100k records / seconds. I think there are solutions to this problem - such as caching the results for requested types etc.
@weeco thanks for your answer. I have now added matching topics in the strictMappingsByTopic (see https://github.com/redpanda-data/console/pull/584/files#diff-cb0859875b7a3634d6b559a35d024fc14ba558a7960fabbd6287d716423246a7R296) so a given topic never have to be searched twice in the regexMappingsByTopic.
I wrote a benchmark with a variable proportion of strict/regex mappings.
Before the fix, performances were degrading following the ratio of regex mappings:
goos: linux
goarch: amd64
pkg: github.com/redpanda-data/console/backend/pkg/proto
cpu: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
BenchmarkService_getMatchingMapping
BenchmarkService_getMatchingMapping/Only_strict_mappings
BenchmarkService_getMatchingMapping/Only_strict_mappings-8 5998 186877 ns/op 8 B/op 0 allocs/op
BenchmarkService_getMatchingMapping/10%_regex_mappings
BenchmarkService_getMatchingMapping/10%_regex_mappings-8 249 4464246 ns/op 451 B/op 2 allocs/op
BenchmarkService_getMatchingMapping/50%_regex_mappings
BenchmarkService_getMatchingMapping/50%_regex_mappings-8 91 12583620 ns/op 2226 B/op 15 allocs/op
BenchmarkService_getMatchingMapping/90%_regex_mappings
BenchmarkService_getMatchingMapping/90%_regex_mappings-8 58 17922738 ns/op 4674 B/op 41 allocs/op
PASS
ok github.com/redpanda-data/console/backend/pkg/proto 5.692s
After the fix, it shows no significant difference:
goos: linux
goarch: amd64
pkg: github.com/redpanda-data/console/backend/pkg/proto
cpu: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
BenchmarkService_getMatchingMapping
BenchmarkService_getMatchingMapping/Only_strict_mappings
BenchmarkService_getMatchingMapping/Only_strict_mappings-8 5724 180517 ns/op 8 B/op 0 allocs/op
BenchmarkService_getMatchingMapping/10%_regex_mappings
BenchmarkService_getMatchingMapping/10%_regex_mappings-8 7711 150304 ns/op 12 B/op 0 allocs/op
BenchmarkService_getMatchingMapping/50%_regex_mappings
BenchmarkService_getMatchingMapping/50%_regex_mappings-8 7566 147093 ns/op 30 B/op 0 allocs/op
BenchmarkService_getMatchingMapping/90%_regex_mappings
BenchmarkService_getMatchingMapping/90%_regex_mappings-8 6926 157498 ns/op 42 B/op 0 allocs/op
PASS
ok github.com/redpanda-data/console/backend/pkg/proto 6.286s
I may add more use cases to the benchmark if I think of some corner cases.
@upils , @weeco can we apprise this topic one final time? It's almost done, guys! I left my comment in the review on how to fix the complexity and keep the steam inside. Can you help me?
@upils were you able to sign the Contributor License Agreement? I honestly can't tell from the bot 😅 Looks like once that is sorted and you get an approval, this PR is ready for prime time!
@upils were you able to sign the Contributor License Agreement? I honestly can't tell from the bot 😅 Looks like once that is sorted and you get an approval, this PR is ready for prime time!
Yes, but I guess I messed up with the mail address I used in my commits. I will rewrite them and it should be good.
raising it up
@upils ping on rewriting the commit emails.
Hey @smatvienko-tb @twmb @brycefisher . Sorry for this long hiatus. I fixed the commit author and started to apply the changes suggested by @smatvienko-tb . I need to add test cases but let me know if this is what you had in mind.
I left two comments. Overall I'd like to suggest to change this PR so that regex strings are recognized if the input is wrapped into two slashes (see my other PR comment).
We can implement a custom type RegexpOrLiteral
which implements the Stringer method and has a custom Unmarshaler, so that you can provide mappings like this:
mappings:
- topicName: normal-exact-match
keyProtoType: "foo"
- topicName: /some-topic-prefix-matc.*/
keyProtoType: "bar"
You can do so by creating a type such as:
type RegexpOrLiteral struct {
literal string
*regexp.Regexp
}
func (r *RegexpOrLiteral) String() string {
if r.literal != "" {
return r.literal
}
if r.Regexp != nil {
return r.Regexp.String()
}
return ""
}
// UnmarshalText unmarshals json into a regexp.Regexp
func (r *RegexpOrLiteral) UnmarshalText(b []byte) error {
regex, err := compileRegex(string(b))
if err == nil {
r.Regexp = regex
}
r.literal = string(b)
return nil
}
// MarshalText marshals regexp.Regexp as string
func (r *RegexpOrLiteral) MarshalText() ([]byte, error) {
if r.Regexp != nil {
return []byte(r.Regexp.String()), nil
}
return []byte(r.literal), nil
}
Additionally koanf (our config management lib) needs to be configured so that it will try the TextUnmarshalling, by adding the mapstructure.TextUnmarshallerHookFunc(),
option here: https://github.com/redpanda-data/console/blob/94ef19bcb9c8202040b3f8bf93ee1ead4fe94ae8/backend/pkg/config/config.go#L133
The compileRegex
method already exists here and checks for the wrapped slashes: https://github.com/redpanda-data/console/blob/94ef19bcb9c8202040b3f8bf93ee1ead4fe94ae8/backend/pkg/config/util_regex.go#L19-L36 . By doing so you have the best of both worlds:
- Ability to provide regexes while being backwards compatible (a single string can be provided for the topic name)
- Regexes are compiled at unmarshal time so that we error early and users should clearly know whether regex is used or not
What do you think @upils @smatvienko-tb ?
Hi @upils and @weeco! Thanks so much for all the hardwork on implementing and reviewing this feature so far. Is there anything I can do to help push this feature over the line?
@brycefisher if you'd like to take this on, would you be able to fork and address suggestions from @weeco just above?
Yes! Happy to put the finishing touches on this.
@upils @smatvienko-tb @brycefisher ping
We'll move this to unscheduled on our side, but please ping once y'all are able to uptake this and we can re-review.
This should be addressed now via #1291.