trurl icon indicating copy to clipboard operation
trurl copied to clipboard

Get path components by index / json array

Open elig0n opened this issue 1 year ago • 8 comments
trafficstars

Where something like: trurl -g {path:2} would get the 2nd slash separated element of a URL path.

The slash separated elements can also take a form of array in the JSON output. Making it easier for parsing.

elig0n avatar May 06 '24 10:05 elig0n

I presume that would make the JSON output perhaps look like something like this?

$ trurl --json curl.se/1/2/3
[
  {
    "url": "http://curl.se/one/two/three",
    "parts": {
      "scheme": "http",
      "host": "curl.se",
      "path": "/one/two/three"
    },
    "path": [
       "one",
       "two",
       "three"  
    ]
  }
]

bagder avatar May 06 '24 11:05 bagder

@bagder Sure why not? I'll leave the implementation specifics up to the developers.

This can also apply to the "host" part i.e. have it split into: domain, subdomain, tld

elig0n avatar May 06 '24 13:05 elig0n

This can also apply to the "host" part i.e. have it split into: domain, subdomain, tld

I suppose that would then be a "host" array since it can in theory contain a large number of parts. A reverse-sorted list perhaps so that it starts with the TLD?

(I just want to be clear that I'm not entirely convinced trurl needs these features, but I'm testing out the ideas and how they would work as a process to making up my mind.)

bagder avatar May 06 '24 21:05 bagder

If you do offer split URLs, supporting an additional form of splitting by PSL might also be useful in some cases (such as the PSL suffix in one part and the rest in another). But, libcurl doesn't give you that so it would need to be done by trurl itself.

dfandrich avatar May 07 '24 02:05 dfandrich

I suppose that would then be a "host" array since it can in theory contain a large number of parts. A reverse-sorted list perhaps so that it starts with the TLD?

Maybe just another sub-object with key-value pairs would suffice

elig0n avatar May 07 '24 07:05 elig0n

Since a path is always separated by slashes and a host name is always separated by periods, I don't quite see the need for trurl to that that splitting. There are plenty of help in tools and languages to split a string by a given separator.

As @dfandrich mentions, getting a PSL out of the host name would be different - but that would require either a API change in libcurl or that trurl accesses libpsl itself. Not something I personally feel is worth it.

bagder avatar May 13 '24 12:05 bagder

Why would you defer the job of splitting paths in the JSON trurl generates to the user who runs i.e. jq ? They should only care about extracting the data they need and not parsing. A common Unix philosophy says: do one thing and do it well.

elig0n avatar May 13 '24 19:05 elig0n

A common Unix philosophy says: do one thing and do it well.

The entire point of that is to not do overly task-specific things in your tools so that the tool can be more generic and it is possible to use external tools to easily integrate it in many different complex applications. I don't understand how else you are interpreting that saying.

emanuele6 avatar May 13 '24 20:05 emanuele6