seq-tickets icon indicating copy to clipboard operation
seq-tickets copied to clipboard

Function(s) for extracting parts from URLs

Open jasonkres opened this issue 3 years ago • 5 comments

Issue

It is difficult to work with URLs. It would help if there was support for using capturing groups in regular expressions (#1383), but even then it would still be very complicated.

Idea

Provide functions to work with URLs. Such as:

UrlPart('https://www.example.com:1234/hello/world?alpha=1&beta=2#test', part)
  • When part is 'scheme' => 'https'
  • When part is 'authority' => 'www.example.com:1234'
  • When part is 'host' => 'www.example.com'
  • When part is 'port' => 1234
  • When part is 'path' => '/hello/world'
  • When part is 'query' => 'alpha=1&beta=2'
  • When part is 'fragment' => 'test'

See: https://en.wikipedia.org/wiki/URL for this terminology.

There are many edge cases in parsing URLs. Suggest cross-testing against a commonly used library.

jasonkres avatar Nov 23 '21 00:11 jasonkres

Would love to see something like this implemented, but more general so any regular expression with capturing groups could be used.

Logs are stored in Postgres database with our current homegrown logging system. Regular expressions with capture groups can be used when selecting information to display. (documentation)

I would like to use something like this in Seq to show more concise graphs.

Loggers are formatted like namespaces (i.e., "a.b.c.d.f") and I only want "d.f" on the charts so they are readable.

select 
   -- substring(Logger from '(\w+\.\w+)$') as DisplayLogger, -- Postgres syntax
   count(*) as EventCount
from stream 
where @Timestamp > Now() - 2h
group by Logger
order by EventCount desc
limit 25

schallm avatar May 24 '23 17:05 schallm

@schallm for your specific example I think you might be able to use indexOf and substring

image

liammclennan avatar May 24 '23 23:05 liammclennan

Can’t test right now, but interesting you can use a regex in the indexof function. I’ll look into that in the morning. I tried testing with substring in my query earlier and I get an error that substring can not be used as an aggregate function in a group by query.

schallm avatar May 25 '23 04:05 schallm

That's right, and there the semantics are the same as postgres. If you group the data then the select clause has to be an aggregate function that can reduce a set of rows (the group) to a single row. This query makes no sense:

image

You can move the Substring operation to the group by clause and it will appear in the output. Then choose your aggregate function.

image

All of the string matching functions support regular expressions, and the ci modifier.

liammclennan avatar May 25 '23 05:05 liammclennan

Thanks so much for your help! This works great! However, I have a few questions/thoughts. 😄

semantics are the same as postgres

While you can group by a substring in Postgres, you have to select it for the columns to be in the results then group by the alias.

Postgres:

select 
   substring(Logger from '(\w+\.\w+)$') as DisplayLogger,
   count(*) as EventCount
from Events
group by DisplayLogger
order by EventCount desc

Working Seq query:

select count(*) as EventCount
from stream 
where @Timestamp > Now() - 2h
group by substring(Logger, indexof(Logger, /(\w+\.)?\w+$/)) as DisplayLogger
order by EventCount desc

For others (and future me), note that you can rename the column in the group by statement as shown above.

This works great as substring doesn't need to know the length of the match as I want the remainder of the string.

It would still be nice to have substring work directly with regular expressions and a capture group so I could get text from the middle of a string. For example, I want to get just the milliseconds out of the following string:

The process (processId: 23) took 6.41ms to complete.

Postgres:

substring(Message from 'took ([\d\.]+)ms')  -- returns '6.41'

Could something like the following syntax be added? Or does functionality similar to this already exist?

substring(Message, /took ([\d\.]+)ms/) -- returns '6.41'

The answer could be "use structured logging". However, I think being able to extract data out of strings would always be useful.

schallm avatar May 25 '23 15:05 schallm