opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
[pkg/ottl] Parse uri string to url.* SemConv attributes
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
parsing URI possible only using complex RegEx
Describe the solution you'd like
Intended converter parses a Uniform Resource Identifier (URI) string and extracts its components into URL SemConv attributes.
Example
input:
"sample": "http://myusername:[email protected]:80/foo.gif?key1=val1&key2=val2#fragment"
Result:
{
"url": {
"path" : "/foo.gif",
"fragment" : "fragment",
"extension" : "gif",
"password" : "mypassword",
"original" : "http://myusername:[email protected]:80/foo.gif?key1=val1&key2=val2#fragment",
"scheme" : "http",
"port" : 80,
"user_info" : "myusername:mypassword",
"domain" : "www.example.com",
"query" : "key1=val1&key2=val2",
"username" : "myusername"
}
}
Describe alternatives you've considered
No response
Additional context
No response
Pinging code owners:
- pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley
See Adding Labels via Comments if you do not have permissions to add labels yourself.
@michalpristas I suspect this is doable via regular expression, and any converter would be doing regex internally. Can this be accomplished with ExtractPatterns?
removed my previous comment, as it was for other issue 🤦
it is surely possible to do so. but this would mean we require every user to use and compile regex on their own. my goal here is ease of use, this one is pretty frequent in our pipelines and i see using ExtractPattern with regex as very error prone. having something working and tested (using UT) will surely benefit our users
I would be okay adding this. We can use the url.Parse stdlib function instead of having regular expressions either in a statement or in our code. In general I think the fewer regexes we and our users have, the better.
I would be okay adding this. We can use the url.Parse stdlib function instead of having regular expressions either in a statement or in our code. In general I think the fewer regexes we and our users have, the better.
I see the value this provides to our end-users, but we don't have a good way right now to decide when a new function should be accepted vs an existing function should be used. We really need some criteria for how new functions get added. @michalpristas you've been really active in OTTL recently providing a lot of good feedback and ideas. Any willingness to try and right up those rules? It would really help with several of the function additions you've requested.
Missing functionality is a strong yes e.g Sorting array, Append, missing converters for SHA512 or MD5.
For other cases I try to assume non technical user, so yes for me is when new function significantly improves UX or provides a more performant way of achieving same result. (Also a bit opinionated, if I have to write regex that is more than 10 characters I don't want to do that)
Example of improved UX is probably #32593 This issue here I think provides better UX and does not use regex so maybe even perf is better (I say maybe because don't have measurement)
If UX is same but assumes less technical user (better naming, terminology...) I'd say it's a maybe. This should be evaluated per use-case and there should be only a few requests/issues like that.
When UX is worse or performance is troublesome, this should be a strong no and there should be no issues like that.
Other way I can think about that is that this should be treated like a code. We should aim for readability first and performance second (meaning performant solution can be achieved by using more low level pieces)
With readability same things as with code should be expected, few important that come to mind
- short, meaningful and descriptive names
- naming consistent across functions
- avoiding deep nesting
fn1(target, fn2(fn3(fn4...))))for frequent use cases - single responsibility & DRY
@michalpristas can you open a PR with these suggestions in ottlfunc README at the bottom so we can continue the discussion and codify the rules.
I opened a pr. I will be on pto for the next 10 days though so my response time may be slightly unresponsive
A couple thoughts:
- We have a stanza URI parser already. We've been able to share code between OTTL functions and stanza parsers in some other cases so I would like to consider what it looks like to do this here as well.
- The stanza parser may have been written before the semantic conventions. If there's a mismatch, I support migrating that code to match semantic conventions with a feature gate to allow users a chance to prepare.
- The result proposed in this issue is structured but the semantic conventions are flat. I believe elastic considers these equivalent but OTel does not. (I may be wrong but this is my understanding.) Therefore I believe the correct result would be:
{
"url.path" : "/foo.gif",
"url.fragment" : "fragment",
"url.extension" : "gif",
"url.password" : "mypassword",
"url.original" : "http://myusername:[email protected]:80/foo.gif?key1=val1&key2=val2#fragment",
"url.scheme" : "http",
"url.port" : 80,
"url.user_info" : "myusername:mypassword",
"url.domain" : "www.example.com",
"url.query" : "key1=val1&key2=val2",
"url.username" : "myusername"
}
I've been writing a lot of OTTL lately and I found myself wanting exactly this functionality as a user convenience.
Fully-compliant URL parsing is definitely non-trivial, and IMO worth implementing centrally.
Closing. Addressed in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32906