trurl icon indicating copy to clipboard operation
trurl copied to clipboard

Please support zero-sized fragment and query

Open lu-zero opened this issue 1 year ago • 9 comments

scheme://host/path/# and scheme://host/path/ are different and so is scheme://host/path/?. It would be nice if --get lets you differentiate them and and --set lets you produce them.

lu-zero avatar Jul 14 '23 11:07 lu-zero

I think this is a limitation of libcurl's urlapi

emanuele6 avatar Jul 14 '23 12:07 emanuele6

From the manual for curl_url_get:

CURLUPART_QUERY

The initial question mark that denotes the beginning of the query part is a delimiter only. It is not part of the query contents.

A not-present query will lead part to be set to NULL. A zero-length query will lead part to be set to a zero-length string.

The query part will also get pluses converted to space when asked to URL decode on get with the CURLU_URLDECODE bit. 

lu-zero avatar Jul 14 '23 12:07 lu-zero

I was mainly referring to it normalising scheme://host/path/? as scheme://host/path/ even though it can distinguish no query and empty query. But yeah, also note that it can only do that for queries, not fragments.

$ ./foo 'scheme://host/path/'
in:     scheme://host/path/
out:    scheme://host/path/
query:  NULL
frag:   NULL
$ ./foo 'scheme://host/path/?'
in:     scheme://host/path/?
out:    scheme://host/path/
query:
frag:   NULL
$ ./foo 'scheme://host/path/#'
in:     scheme://host/path/#
out:    scheme://host/path/
query:  NULL
frag:   NULL
$ ./foo 'scheme://host/path/?#'
in:     scheme://host/path/?#
out:    scheme://host/path/
query:
frag:   NULL
$ ./foo 'scheme://host/path/?#hello'
in:     scheme://host/path/?#hello
out:    scheme://host/path/#hello
query:
frag:   hello
$ ./foo 'scheme://host/path/?hello#'
in:     scheme://host/path/?hello#
out:    scheme://host/path/?hello
query:  hello
frag:   NULL 

libcurl always normalises empty query/fragment as no query/fragment; and it does not provide a way to distinguish empty fragment from no fragment.


#include <curl/curl.h>

int main(int const argc, char const *const argv[])
{
    if (argc != 2)
        return 1;

    CURLU *const uh = curl_url();
    curl_url_set(uh, CURLUPART_URL, argv[1],
                 CURLU_NON_SUPPORT_SCHEME|
                 CURLU_GUESS_SCHEME|
                 CURLU_URLENCODE);
    char *url;
    curl_url_get(uh, CURLUPART_URL, &url, CURLU_DEFAULT_PORT);
    char *query;
    curl_url_get(uh, CURLUPART_QUERY, &query, CURLU_DEFAULT_PORT);
    char *frag;
    curl_url_get(uh, CURLUPART_FRAGMENT, &frag, CURLU_DEFAULT_PORT);
    printf("in:\t%s\n"
           "out:\t%s\n"
           "query:\t%s\n"
           "frag:\t%s\n",
           argv[1], url, query ? query : "NULL", frag ? frag : "NULL");
    curl_free(url);
    curl_free(query);
    curl_free(frag);
    curl_url_cleanup(uh);
    return 0;
}

emanuele6 avatar Jul 14 '23 12:07 emanuele6

Yes, I was hoping that it could be reflected in trurl as well. (and curl itself is in the good bucket for that :))

Thank you for providing also the full demo code :)

lu-zero avatar Jul 14 '23 17:07 lu-zero

See https://github.com/curl/curl/pull/13396

bagder avatar Apr 17 '24 09:04 bagder

With libcurl supporting empty queries and fragments now, how do you think we should enable this in trurl?

bagder avatar Apr 18 '24 12:04 bagder

Probably would be useful to have --unset to clean up query and frag, and make so trurl scheme://host/path --set query="" would return scheme://host/path?.

and have --get {component} return nothing if not present and the empty line if present.

But those would be breaking changes.

lu-zero avatar Apr 18 '24 12:04 lu-zero

But those would be breaking changes.

I think we are free to do breaking changes if we want, at least before an official version one. I think the bigger problem is that they would work differently depending on what the underlying libcurl in use supports...

bagder avatar Apr 18 '24 12:04 bagder

another (clunky) solution may be a new flag--allow-empty for --get and --set?

or something clever with the the modifiers in the get brackets --get, something like --get "{empty:query} {empty:fragment}" ? Im not sure how this would work with setting empty fields though.

jacobmealey avatar Apr 18 '24 13:04 jacobmealey

Im not sure how this would work with setting empty fields though.

I figure we might need to do some other syntax extension/change for that. Maybe

  • Without assign: --set fragment
  • Or with another character instead of assign, like a colon: --set fragment:

Of course, this would only work for query and fragment. Maybe path?

bagder avatar Aug 14 '24 06:08 bagder

There is also the shell problem: how would a script differentiate between a blank query and a non-existing one?

trurl example.com/? -g a{query}a

vs

trurl example.com/ -g a{query}a

What is the expected output for a zero length query vs a non-existing one?

bagder avatar Aug 27 '24 07:08 bagder

I guess the latter has to report an error somehow, maybe adding a {fail:component} modifier so both behaviors are supported?

lu-zero avatar Aug 27 '24 08:08 lu-zero

#336 at least partly satisfies this.

bagder avatar Aug 27 '24 11:08 bagder

@lu-zero does this satisfy your use case or is there anything more you want/need to differentiate empty/missing components for?

bagder avatar Aug 27 '24 13:08 bagder

I think it is enough, thank you :)

lu-zero avatar Aug 27 '24 15:08 lu-zero