symfony icon indicating copy to clipboard operation
symfony copied to clipboard

[Uri] Add component

Open alexandre-daubois opened this issue 1 year ago β€’ 14 comments

Q A
Branch? 7.2
Bug fix? no
New feature? yes
Deprecations? no
Issues -
License MIT

The URI component is parsing and building URIs.

This new component is very similar to some other low-level Symfony components that enhances PHP built-in features (e.g. VarExporter, VarDumper, Process to some extent). The primary goal is to have a consistent and object-oriented approach for parse_url() and parse_str() functions.

What issues does it fix?

  • parse_str() overwrites any duplicate field in the query parameter (e.g. ?foo=bar&foo=baz will return ['foo' => 'baz']). foo should be an array instead with the two values. More info and examples can be found at Stackoverflow.
  • parse_str() replaces . in the query parameter keys with _, thus no distinction can be done between foo.bar and foo_bar.
  • parse_str() doesn't "support" + in the parameter keys and replaces them with _ instead of a space.
  • parse_url() does not decode the auth component of the URL (user and pass). This makes it impossible to use the parse_url() function to parse a URL with a username or password that contains a colon (:) or an @ character.

Other Considerations

  • Symfony\Component\HttpKernel\UriSigner could be moved to this new component
  • Symfony\Component\HttpFoundation\UrlHelper could be moved to this component
  • Symfony\Component\DomCrawler\UriResolver could be moved to this new component, but it may be replaced by a resolve() method in the new component (as shown below)

Other languages

  • Golang has a built-in net/url package to parse URLs (https://pkg.go.dev/net/[email protected])
  • Python also has a native utility to parse URLs called urllib (https://docs.python.org/3/library/urllib.html)
  • Rust doesn't have a native URL crate, but a big Rust project related to browsers has developed a rust-url crate (https://github.com/servo/rust-url)

Specification

The Uri class

The Uri class allows to create and manipulate Uri, as well as parsing strings to build an Uri object.

final class Uri implements \Stringable
{
    public function __construct(
        public string $scheme,
        public ?string $user = null,
        public ?string $password = null,
        public ?string $host = null,
        public ?int $port = null,
        public ?string $path = null,
        public ?QueryString $query = null,
        public ?string $fragment = null,
        public ?FragmentTextDirective $fragmentTextDirective = null,
    ) {
    }

    /**
     * Parses a URL.
     *
     * The `user` and `pass` keys are url-decoded automatically when parsing.
     *
     * @throws InvalidUriException
     */
    public static function parse(string $uri): static;

    /**
     * Resolves a relative URI against a base URI.
     *
     * Uri::resolve('http://example.com', '/foo/bar'); // http://example.com/foo/bar
     * Uri::resolve('http://example.com/foo', '/bar'); // http://example.com/bar
     * Uri::resolve('http://example.com/foo', 'bar'); // http://example.com/foo/bar
     * Uri::resolve('http://example.com/foo/', 'bar'); // http://example.com/foo/bar
     * Uri::resolve('http://example.com/foo/', '../bar'); // http://example.com/bar
     * Uri::resolve('http://example.com/foo/', '../../bar'); // http://example.com/bar
     * Uri::resolve('http://example.com/foo/', '/bar'); // http://example.com/bar
     * Uri::resolve('http://example.com/foo/', 'http://example.org/bar'); // http://example.org/bar
     */
    public static function resolve(Uri|string $baseUri, string $relativeUri): string;

    /**
     * Returns a new instance with a new fragment text directive.
     */
    public function withFragmentTextDirective(string $start, ?string $end = null, ?string $prefix = null, ?string $suffix = null): static;

    /**
     * Returns a new instance with the host part of the URI converted to ASCII.
     *
     * @see https://www.unicode.org/reports/tr46/#ToASCII
     */
    public function withIdnHostAsAscii(): static;

    /**
     * Returns a new instance with the host part of the URI converted to Unicode.
     *
     * @see https://www.unicode.org/reports/tr46/#ToUnicode
     */
    public function withIdnHostAsUnicode(): static;

    public function __toString();
}

The QueryString class

This class manipulates and parses query strings.

final class QueryString implements \Stringable
{
    /**
     * Parses a URI.
     *
     * Unlike `parse_str()`, this method does not overwrite duplicate keys but instead
     * returns an array of all values for each key:
     *
     * QueryString::parse('foo=1&foo=2&bar=3'); // stored as ['foo' => ['1', '2'], 'bar' => '3']
     *
     * `+` are supported in parameter keys and not replaced by an underscore:
     *
     * QueryString::parse('foo+bar=1'); // stored as ['foo bar' => '1']
     *
     * `.` and `_` are supported distinct in parameter keys:
     *
     * QueryString::parse('foo.bar=1'); // stored as ['foo.bar' => '1']
     * QueryString::parse('foo_bar=1'); // stored as ['foo_bar' => '1']
     */
    public static function parse(string $query): QueryString;

    public function has(string $key): bool;

    /**
     * @return null|string|string[]
     */
    public function get(string $key): string|array|null;

    public function set(string $key, array|string|null $value): self;

    public function remove(string $key): self;

    /**
     * @return array<string, string|string[]>
     */
    public function all(): array;

    public function __toString(): string;
}

Fragment Text Directives

Fragment Text Directives are supported as well.

final class FragmentTextDirective implements \Stringable
{
    public function __construct(
        public string $start,
        public ?string $end = null,
        public ?string $prefix = null,
        public ?string $suffix = null,
    ) {
    }

    /**
     * Dash, comma and ampersand are encoded, @see https://wicg.github.io/scroll-to-text-fragment/#syntax.
     */
    public function __toString(): string;
}

Prefix and suffix are respectively suffixed and prefixed by a dash. If the dash is not provided in the string, it is added automatically:

$uri = Uri::parse('https://tnyholm.se/about');
$uri->withFragmentTextDirectives('start', 'end', 'prefix', 'suffix');

echo (string) $uri; // https://tnyholm.se/about#:~:text=prefix-,start,end,-suffix

This also works with an existing fragment:

$uri = Uri::parse('https://tnyholm.se/about#existing');
$uri->withFragmentTextDirectives('start', 'end', 'prefix', 'suffix');

echo (string) $uri; // https://tnyholm.se/about#existing:~:text=prefix-,start,end,-suffix

Dash, ampersand and comma are automatically escaped in the fragment text directives, as stated in the specification:

$uri = Uri::parse('https://tnyholm.se/about');
$uri->withFragmentTextDirectives('sta-r,t', '&nd', 'prefix');

echo (string) $uri; // https://tnyholm.se/about#:~:text=prefix-,sta%2Dr%2Ct,%26nd

Usage / leveraging the component

The first place where this component can be leveraged is DSN parsing (initially, the idea came from some issues parsing DSN for the different cache adapters). As a proof of concept, here is an example of how this component can be used, here the Redis cache adapter:

Redis cache adapter using the Uri component
diff --git a/src/Symfony/Component/Cache/Traits/RedisTrait.php b/src/Symfony/Component/Cache/Traits/RedisTrait.php
index 8183f92935..97ecb07df9 100644
--- a/src/Symfony/Component/Cache/Traits/RedisTrait.php
+++ b/src/Symfony/Component/Cache/Traits/RedisTrait.php
@@ -25,6 +25,8 @@ use Symfony\Component\Cache\Exception\CacheException;
use Symfony\Component\Cache\Exception\InvalidArgumentException;
use Symfony\Component\Cache\Marshaller\DefaultMarshaller;
use Symfony\Component\Cache\Marshaller\MarshallerInterface;
+use Symfony\Component\Uri\QueryString;
+use Symfony\Component\Uri\Uri;

/**
* @author Aurimas Niekis <[email protected]>
@@ -86,11 +88,8 @@ trait RedisTrait
    */
   public static function createConnection(#[\SensitiveParameter] string $dsn, array $options = []): \Redis|\RedisArray|\RedisCluster|\Predis\ClientInterface|Relay
   {
-        if (str_starts_with($dsn, 'redis:')) {
-            $scheme = 'redis';
-        } elseif (str_starts_with($dsn, 'rediss:')) {
-            $scheme = 'rediss';
-        } else {
+        $dsn = Uri::parse($dsn);
+        if (!\in_array($dsn->scheme, ['redis', 'rediss'], true)) {
           throw new InvalidArgumentException('Invalid Redis DSN: it does not start with "redis[s]:".');
       }

@@ -98,68 +97,58 @@ trait RedisTrait
           throw new CacheException('Cannot find the "redis" extension nor the "predis/predis" package.');
       }

-        $params = preg_replace_callback('#^'.$scheme.':(//)?(?:(?:(?<user>[^:@]*+):)?(?<password>[^@]*+)@)?#', function ($m) use (&$auth) {
-            if (isset($m['password'])) {
-                if (\in_array($m['user'], ['', 'default'], true)) {
-                    $auth = rawurldecode($m['password']);
-                } else {
-                    $auth = [rawurldecode($m['user']), rawurldecode($m['password'])];
-                }
-
-                if ('' === $auth) {
-                    $auth = null;
-                }
-            }
-
-            return 'file:'.($m[1] ?? '');
-        }, $dsn);
+        $hosts = [];
+        $query = $dsn->query?->all() ?? [];
+        $tls = 'rediss' === $dsn->scheme;
+        $tcpScheme = $tls ? 'tls' : 'tcp';

-        if (false === $params = parse_url($params)) {
-            throw new InvalidArgumentException('Invalid Redis DSN.');
+        $auth = null;
+        if ($dsn->password) {
+            $auth = [$dsn->password];
       }

-        $query = $hosts = [];
-
-        $tls = 'rediss' === $scheme;
-        $tcpScheme = $tls ? 'tls' : 'tcp';
-
-        if (isset($params['query'])) {
-            parse_str($params['query'], $query);
+        if ($dsn->user) {
+            $auth ??= [];
+            array_unshift($auth, $dsn->user);
+        }

-            if (isset($query['host'])) {
-                if (!\is_array($hosts = $query['host'])) {
+        if (null !== $dsn->query) {
+            $queryString = $dsn->query;
+            if ($queryString->has('host')) {
+                if (!\is_array($hosts = $queryString->get('host'))) {
                   throw new InvalidArgumentException('Invalid Redis DSN: query parameter "host" must be an array.');
               }
+
               foreach ($hosts as $host => $parameters) {
                   if (\is_string($parameters)) {
-                        parse_str($parameters, $parameters);
+                        $parameters = QueryString::parse($parameters);
                   }
                   if (false === $i = strrpos($host, ':')) {
-                        $hosts[$host] = ['scheme' => $tcpScheme, 'host' => $host, 'port' => 6379] + $parameters;
+                        $hosts[$host] = ['scheme' => $tcpScheme, 'host' => $host, 'port' => 6379] + $parameters->all();
                   } elseif ($port = (int) substr($host, 1 + $i)) {
-                        $hosts[$host] = ['scheme' => $tcpScheme, 'host' => substr($host, 0, $i), 'port' => $port] + $parameters;
+                        $hosts[$host] = ['scheme' => $tcpScheme, 'host' => substr($host, 0, $i), 'port' => $port] + $parameters->all();
                   } else {
-                        $hosts[$host] = ['scheme' => 'unix', 'path' => substr($host, 0, $i)] + $parameters;
+                        $hosts[$host] = ['scheme' => 'unix', 'path' => substr($host, 0, $i)] + $parameters->all();
                   }
               }
               $hosts = array_values($hosts);
           }
       }

-        if (isset($params['host']) || isset($params['path'])) {
-            if (!isset($params['dbindex']) && isset($params['path'])) {
-                if (preg_match('#/(\d+)?$#', $params['path'], $m)) {
-                    $params['dbindex'] = $m[1] ?? '0';
-                    $params['path'] = substr($params['path'], 0, -\strlen($m[0]));
-                } elseif (isset($params['host'])) {
+        if ($dsn->host || $dsn->path) {
+            if (!$dsn->query?->has('dbindex') && $dsn->path) {
+                if (preg_match('#/(\d+)?$#', $dsn->path, $m)) {
+                    $query['dbindex'] = $m[1] ?? '0';
+                    $dsn->path = substr($dsn->path, 0, -\strlen($m[0]));
+                } elseif ($dsn->host) {
                   throw new InvalidArgumentException('Invalid Redis DSN: query parameter "dbindex" must be a number.');
               }
           }

-            if (isset($params['host'])) {
-                array_unshift($hosts, ['scheme' => $tcpScheme, 'host' => $params['host'], 'port' => $params['port'] ?? 6379]);
+            if ($dsn->host) {
+                array_unshift($hosts, ['scheme' => $tcpScheme, 'host' => $dsn->host, 'port' => $dsn->port ?? 6379]);
           } else {
-                array_unshift($hosts, ['scheme' => 'unix', 'path' => $params['path']]);
+                array_unshift($hosts, ['scheme' => 'unix', 'path' => $dsn->path]);
           }
       }

@@ -167,7 +156,7 @@ trait RedisTrait
           throw new InvalidArgumentException('Invalid Redis DSN: missing host.');
       }

-        $params += $query + $options + self::$defaultConnectionOptions;
+        $params = $query + $options + self::$defaultConnectionOptions;

       if (isset($params['redis_sentinel']) && isset($params['sentinel_master'])) {
           throw new InvalidArgumentException('Cannot use both "redis_sentinel" and "sentinel_master" at the same time.');

More generally, every DSN parsing across the Symfony codebase could benefit from this component. I'm thinking about the authority decoding part allowing to put @ in username and passwords in all DSN (which is currently either supported individually, either not supported at all).


Thank you everyone involved in the specs redaction as well as preliminary review. And big thanks to @Nyholm for helping kickstart this new component! πŸ™

alexandre-daubois avatar May 27 '24 09:05 alexandre-daubois

do we ignore Psr\Http\Message\UriInterface deliberately?

ro0NL avatar May 27 '24 09:05 ro0NL

Yes, it's not on the radar of comprehensive URI modelling.

nicolas-grekas avatar May 27 '24 10:05 nicolas-grekas

do we ignore Psr\Http\Message\UriInterface deliberately?

We have a bridge for that. ✌️

derrabus avatar May 27 '24 10:05 derrabus

Hi!

Before going forward with this PR, I feel like there is something we have to discuss first. I think it's good to do an open discussion on the prior art in the PHP community, to be transparent on why a new component would be necessary in the first place.

One library that is my mind immediately is the URI library provided by the PHP league: https://uri.thephpleague.com/ This one has also been brought up in the previous PR proposing a DNS component (#36999), and it looks like the goal and feature set of this proposed new component is very similar to this library.

Has there been any research in whether it is possible to use this library in Symfony instead? Both from a technical perspective as from a maintainer perspective? Have we talked with @nyamsprod about this? At least from a maintainer perspective, the PHP league's standards already looks very similar to Symfony's and they are a proven organization for building and maintaining robust libraries. Maybe they are willing, with our help, to make the last steps to make Symfony able to use their library? (if there are any steps left?)

wouterj avatar May 27 '24 17:05 wouterj

Thanks for bringing @nyamsprod in the discussion. I met him in Brussels a few months back and told him about this initiative. We could even borrow code, because the code made in the league package is impressive, hopefully with @nyamsprod blessings.

From a "package principles" pov, the League is mostly a bunch of talented individuals that don't work much together, so this doesn't fit for Symfony needs in terms of maintainership. But I'd be super happy to accompany anyone get into the Symfony community!

nicolas-grekas avatar May 27 '24 18:05 nicolas-grekas

This new component has some pretty strong biases that run counter to what PHP currently offers (typically, support for β€œ_”, β€œ.” and β€œ+” in query strings). It seems complicated to me to bring these biases into thephpleague/uri? In any case, I could understand if the maintainer didn't want to make these changes, which could be some hard breaking changes.

The original idea behind this component is the fact that DSN parsing within Symfony is duplicated code in many places. The first motivation for this component is to be able to support characters encoded in the user and pass, something that is handled individually in the parsing of each place where it's needed in the code. The idea is to factorize all this. Doesn't thephpleague/uri seem to offer this feature at first glance? I could be wrong, I haven't looked at all the code in detail. πŸ˜„

alexandre-daubois avatar May 27 '24 18:05 alexandre-daubois

From a "package principles" pov, the League is mostly a bunch of talented individuals that don't work much together, so this doesn't fit for Symfony needs in terms of maintainership. But I'd be super happy to accompany anyone get into the Symfony community!

Yet, we already use the library as a core dependency of one of our components: https://github.com/symfony/symfony/blob/9ca558ec4fc8aaf854dede342695d85da4f2f169/src/Symfony/Component/HtmlSanitizer/composer.json#L21 And we have more core dependencies that are maintained by a single maintainer.

Of course, we need to discuss this, but some core team members already maintain packages in the PHP league. So we've even solved a part of "what if this project becomes abandoned", as core team members within PHP league can take it over more easily than if Symfony had to fork the project (of course, if this happens we need to discuss and ask for permissions for this then).

So I think it's unfair to reject the idea solely based on this argument. Let's investigate the technical side as well (and check with the @nyamsprod about his opinion/idea of the matter).

wouterj avatar May 27 '24 18:05 wouterj

to @alexandre-daubois, @wouterj and to everyone hi from Brussels! Since I have been mentioned I will quickly clarify some things.

I have not check the code but the premise for this component do align with those that guide me when developing league/uri. The league/uri ships with its own query parser and url parser for the same reason you are developing this package as far as I understood. This is illustrated in the first example on the documentation homepage for the package πŸ˜‰ . The goal of the library is to be as interop as possible and to follow as much as possible established RFC and standards. The packages, there are 3 of them, do a lot more but that's not the point of this response to elaborate on that.

In all fairness, I would have hope and prefer that the contribution could be made directly to The league/uri package but as @nicolas-grekas stated maybe the philosophy and the requirements between your organisation and my way of maintaining the package do not 100% aligned. That being said I believe they are parts of Symfony which do use my package already and I have already merged contributions from people from the Symfony in league/uri so to improve the interop between us. I even recall telling to @tgalopin about the incoming release of v7 so that he can prepare his component about the changes I was making. In other words, there are precedent for mutual participation/coordination in improving league/uri and Symfony.

Last but not least, I indeed met @nicolas-grekas in Brussels and he did ask me if I was OK if people were to use my code elsewhere. Back then I thought he was referring to another package of mine (https://github.com/bakame-php/http-structured-fields) as we discuss about it more in depth but to me it is all the same so I re-iterate the fact that I am ok with what suit you the best. I have no desire to impose anything on anybody, the choice is all yours.

I hope this lengthy response does answer or clarify any doubt from anyone. We are here for the better of PHP anyway so let's just do that one way or another!

peace out!

nyamsprod avatar May 27 '24 18:05 nyamsprod

On another note you wrote. I just re-read the bullet points:

parse_str() overwrites any duplicate field in the query parameter (e.g. ?foo=bar&foo=baz will return ['foo' => 'baz']). foo should be an array instead with the two values. More info and examples can be found at Stackoverflow.

I used this approach in an EOL version of league/uri and it is the wrong approach you should look into the Living Standard. There you will see that the correct way to represents query string parameters is a collection of KeyValue Pair as those will guarantee that building and parsing query strings will always return the same value in the same order (might be crucial for URL signing if I understood correctly what you want to achieve).

league/uri-components has a PHP implementation of the Living standard you may use as a template or reference.

nyamsprod avatar May 27 '24 21:05 nyamsprod

Refs #57190

OskarStark avatar May 28 '24 04:05 OskarStark

Thank you very much for your message Ignace and for these very complete explanations! I think this component makes sense within Symfony, the APIs between your packages and this one are quite different. The purpose of this component is to respond to problems we've encountered in the Symfony codebase (the decoding of user and password being the first ones we wanted to solve) while remaining fairly minimal. When we wrote the specs for the component, we thought about what we wanted in it and also what we didn't want, to keep the scope fairly restricted. This is just my opinion, let's see other people's opinions of course! πŸ˜ƒ

In any case, thank you very much for raising the query string point and avoiding making a mistake. I've updated to respect the URL Living Standard document. Also thanks for the league/uri-components link, which helped me understand what it's all about!

alexandre-daubois avatar May 28 '24 07:05 alexandre-daubois

what about templated URIs? was it considered?

ro0NL avatar May 28 '24 11:05 ro0NL

@ro0NL yes! We considered it while writing the specs. I did not implement it in this first iteration, but it definitely has its place here in my opinion

alexandre-daubois avatar May 28 '24 11:05 alexandre-daubois

In advance: I'm sorry to still be pushing back here. This is nothing against the work and thoughts that has been put in this PR.

I'm not yet convinced that we need a Uri component at all. Rather than building a slightly smaller component that borrows code and ideas from an existing library, I think we should always favor using the existing library if there are no major compelling reasons to not do so (and given it's already a dependency of Symfony, I guess we've already established in the past there aren't).

As an example, I've written a fully compatible version of the RedisTrait using the league libraries. As you can see, the difference in code with the patch in the PR description is almost negligible.

Redis cache adapter with league/uri library
diff --git a/src/Symfony/Component/Cache/Traits/RedisTrait.php b/src/Symfony/Component/Cache/Traits/RedisTrait.php
index 8183f92935..9e2163d73e 100644
--- a/src/Symfony/Component/Cache/Traits/RedisTrait.php
+++ b/src/Symfony/Component/Cache/Traits/RedisTrait.php
@@ -11,6 +11,9 @@
 
 namespace Symfony\Component\Cache\Traits;
 
+use League\Uri\Components\Query;
+use League\Uri\Components\UserInfo;
+use League\Uri\Uri;
 use Predis\Command\Redis\UNLINK;
 use Predis\Connection\Aggregate\ClusterInterface;
 use Predis\Connection\Aggregate\RedisCluster;
@@ -86,11 +89,8 @@ trait RedisTrait
      */
     public static function createConnection(#[\SensitiveParameter] string $dsn, array $options = []): \Redis|\RedisArray|\RedisCluster|\Predis\ClientInterface|Relay
     {
-        if (str_starts_with($dsn, 'redis:')) {
-            $scheme = 'redis';
-        } elseif (str_starts_with($dsn, 'rediss:')) {
-            $scheme = 'rediss';
-        } else {
+        $dsn = Uri::new($dsn);
+        if (!\in_array($dsn->getScheme(), ['redis', 'rediss'], true)) {
             throw new InvalidArgumentException('Invalid Redis DSN: it does not start with "redis[s]:".');
         }
 
@@ -98,68 +98,54 @@ trait RedisTrait
             throw new CacheException('Cannot find the "redis" extension nor the "predis/predis" package.');
         }
 
-        $params = preg_replace_callback('#^'.$scheme.':(//)?(?:(?:(?<user>[^:@]*+):)?(?<password>[^@]*+)@)?#', function ($m) use (&$auth) {
-            if (isset($m['password'])) {
-                if (\in_array($m['user'], ['', 'default'], true)) {
-                    $auth = rawurldecode($m['password']);
-                } else {
-                    $auth = [rawurldecode($m['user']), rawurldecode($m['password'])];
-                }
-
-                if ('' === $auth) {
-                    $auth = null;
-                }
-            }
-
-            return 'file:'.($m[1] ?? '');
-        }, $dsn);
-
-        if (false === $params = parse_url($params)) {
-            throw new InvalidArgumentException('Invalid Redis DSN.');
-        }
-
-        $query = $hosts = [];
-
-        $tls = 'rediss' === $scheme;
+        $hosts = [];
+        $query = Query::new($dsn->getQuery());
+        $tls = 'rediss' === $dsn->getScheme();
         $tcpScheme = $tls ? 'tls' : 'tcp';
+        $userInfo = UserInfo::new($dsn->getUserInfo());
+        $auth = null;
+        if ($userInfo->getPass()) {
+            $auth = [$userInfo->getPass()];
+        }
 
-        if (isset($params['query'])) {
-            parse_str($params['query'], $query);
+        if ($userInfo->getUser()) {
+            $auth ??= [];
+            array_unshift($auth, $userInfo->getUser());
+        }
 
-            if (isset($query['host'])) {
-                if (!\is_array($hosts = $query['host'])) {
-                    throw new InvalidArgumentException('Invalid Redis DSN: query parameter "host" must be an array.');
+        if ($query->hasParameter('host')) {
+            $hosts = $query->parameter('host');
+            foreach ($hosts as $host => $parameters) {
+                if (\is_string($parameters)) {
+                    $parameters = Query::new($parameters);
                 }
-                foreach ($hosts as $host => $parameters) {
-                    if (\is_string($parameters)) {
-                        parse_str($parameters, $parameters);
-                    }
-                    if (false === $i = strrpos($host, ':')) {
-                        $hosts[$host] = ['scheme' => $tcpScheme, 'host' => $host, 'port' => 6379] + $parameters;
-                    } elseif ($port = (int) substr($host, 1 + $i)) {
-                        $hosts[$host] = ['scheme' => $tcpScheme, 'host' => substr($host, 0, $i), 'port' => $port] + $parameters;
-                    } else {
-                        $hosts[$host] = ['scheme' => 'unix', 'path' => substr($host, 0, $i)] + $parameters;
-                    }
+                if (false === $i = strrpos($host, ':')) {
+                    $hosts[$host] = ['scheme' => $tcpScheme, 'host' => $host, 'port' => 6379] + $parameters->parameters();
+                } elseif ($port = (int) substr($host, 1 + $i)) {
+                    $hosts[$host] = ['scheme' => $tcpScheme, 'host' => substr($host, 0, $i), 'port' => $port] + $parameters->parameters();
+                } else {
+                    $hosts[$host] = ['scheme' => 'unix', 'path' => substr($host, 0, $i)] + $parameters->parameters();
                 }
-                $hosts = array_values($hosts);
             }
+            $hosts = array_values($hosts);
         }
 
-        if (isset($params['host']) || isset($params['path'])) {
-            if (!isset($params['dbindex']) && isset($params['path'])) {
-                if (preg_match('#/(\d+)?$#', $params['path'], $m)) {
-                    $params['dbindex'] = $m[1] ?? '0';
-                    $params['path'] = substr($params['path'], 0, -\strlen($m[0]));
-                } elseif (isset($params['host'])) {
+        $host = $dsn->getHost();
+        $path = $dsn->getPath();
+        if ($host || $path) {
+            if (!$query->hasParameter('dbindex') && $path) {
+                if (preg_match('#/(\d+)?$#', $path, $m)) {
+                    $query = $query->appendTo('dbindex', $m[1] ?? '0');
+                    $path = substr($path, 0, -\strlen($m[0]));
+                } elseif ($host) {
                     throw new InvalidArgumentException('Invalid Redis DSN: query parameter "dbindex" must be a number.');
                 }
             }
 
-            if (isset($params['host'])) {
-                array_unshift($hosts, ['scheme' => $tcpScheme, 'host' => $params['host'], 'port' => $params['port'] ?? 6379]);
+            if ($host) {
+                array_unshift($hosts, ['scheme' => $tcpScheme, 'host' => $host, 'port' => $dsn->getPort() ?? 6379]);
             } else {
-                array_unshift($hosts, ['scheme' => 'unix', 'path' => $params['path']]);
+                array_unshift($hosts, ['scheme' => 'unix', 'path' => $path]);
             }
         }
 
@@ -167,7 +153,7 @@ trait RedisTrait
             throw new InvalidArgumentException('Invalid Redis DSN: missing host.');
         }
 
-        $params += $query + $options + self::$defaultConnectionOptions;
+        $params = $query->parameters() + $options + self::$defaultConnectionOptions;
 
         if (isset($params['redis_sentinel']) && isset($params['sentinel_master'])) {
             throw new InvalidArgumentException('Cannot use both "redis_sentinel" and "sentinel_master" at the same time.');

Another good reason to use league/uri instead imho would be that we give the users a Uri class that implements RFC-3986 and the URL Living Standard, rather than a Uri class which is scoped specifically for internal usage of Symfony's DSN env vars. The latter can cause for issues when users decide to start using the Uri class for their own non-DSN use-cases. And having only 29 small to medium sized classes in league/uri and league/uri-components together doesn't sound "bloated" enough to warrant a Symfony component with a smaller feature-set imho.

(and thanks @nyamsprod for your lengthy response!)

wouterj avatar May 28 '24 22:05 wouterj

One thing that bothers me about the example you give for the Redis cache adapter is that it requires the addition of two external dependencies (uri and uri-components) instead of just one internal one. I think it's not ideal to have to add 2 external packages to achieve the same result.

Even if this dependency already exists within the framework, do we want to further increase the coupling to this package? Again, to achieve the same result, the two dependencies will be needed in the different components to properly parse DSNs. Indeed, uri is required to parse the URL, and uri-components is required to parse the user info and do the decoding stuff. Seems heavy to me.

The latter can cause for issues when users decide to start using the Uri class for their own non-DSN use-cases.

I guess it is the same problem with every class in the framework, we cannot limit their usage if a user tries to use it for something else. But I may be missing something, did you have a particular example in mind?

alexandre-daubois avatar May 29 '24 06:05 alexandre-daubois

Actually, in terms of deps-hell, there is a show stopper: this requires PSR-7. We went this way for HtmlSanitizer because it's a minor component, but that's nontheless still a problem to me. I certainly do not want to have to require PSR-7 and its myriad of other deps when using HttpClient, Cache, HttpFoundation (or even HtmlSanitizer).

nicolas-grekas avatar May 29 '24 06:05 nicolas-grekas

To build on what Nicolas is saying. This PR suggest to introduce 6 new classes. We cannot compare that to the 75 classes added if you do composer require league/uri-components

See all classes
vendor/league/uri-components/IPv4Normalizer.php
vendor/league/uri-components/UriModifier.php
vendor/league/uri-components/Components/Domain.php
vendor/league/uri-components/Components/Port.php
vendor/league/uri-components/Components/Host.php
vendor/league/uri-components/Components/Authority.php
vendor/league/uri-components/Components/HierarchicalPath.php
vendor/league/uri-components/Components/Path.php
vendor/league/uri-components/Components/UserInfo.php
vendor/league/uri-components/Components/DataPath.php
vendor/league/uri-components/Components/Fragment.php
vendor/league/uri-components/Components/Component.php
vendor/league/uri-components/Components/Query.php
vendor/league/uri-components/Components/URLSearchParams.php
vendor/league/uri-components/Components/Scheme.php
vendor/league/uri-components/Modifier.php
vendor/league/uri-interfaces/FeatureDetection.php
vendor/league/uri-interfaces/Encoder.php
vendor/league/uri-interfaces/Contracts/UriInterface.php
vendor/league/uri-interfaces/Contracts/UriComponentInterface.php
vendor/league/uri-interfaces/Contracts/FragmentInterface.php
vendor/league/uri-interfaces/Contracts/IpHostInterface.php
vendor/league/uri-interfaces/Contracts/UserInfoInterface.php
vendor/league/uri-interfaces/Contracts/AuthorityInterface.php
vendor/league/uri-interfaces/Contracts/HostInterface.php
vendor/league/uri-interfaces/Contracts/PathInterface.php
vendor/league/uri-interfaces/Contracts/DomainHostInterface.php
vendor/league/uri-interfaces/Contracts/PortInterface.php
vendor/league/uri-interfaces/Contracts/UriException.php
vendor/league/uri-interfaces/Contracts/DataPathInterface.php
vendor/league/uri-interfaces/Contracts/QueryInterface.php
vendor/league/uri-interfaces/Contracts/UriAccess.php
vendor/league/uri-interfaces/Contracts/SegmentedPathInterface.php
vendor/league/uri-interfaces/KeyValuePair/Converter.php
vendor/league/uri-interfaces/UriString.php
vendor/league/uri-interfaces/Idna/Option.php
vendor/league/uri-interfaces/Idna/Result.php
vendor/league/uri-interfaces/Idna/Error.php
vendor/league/uri-interfaces/Idna/Converter.php
vendor/league/uri-interfaces/Exceptions/OffsetOutOfBounds.php
vendor/league/uri-interfaces/Exceptions/MissingFeature.php
vendor/league/uri-interfaces/Exceptions/ConversionFailed.php
vendor/league/uri-interfaces/Exceptions/SyntaxError.php
vendor/league/uri-interfaces/IPv4/GMPCalculator.php
vendor/league/uri-interfaces/IPv4/Calculator.php
vendor/league/uri-interfaces/IPv4/Converter.php
vendor/league/uri-interfaces/IPv4/BCMathCalculator.php
vendor/league/uri-interfaces/IPv4/NativeCalculator.php
vendor/league/uri-interfaces/QueryString.php
vendor/league/uri/UriTemplate.php
vendor/league/uri/Http.php
vendor/league/uri/HttpFactory.php
vendor/league/uri/Uri.php
vendor/league/uri/UriInfo.php
vendor/league/uri/BaseUri.php
vendor/league/uri/UriResolver.php
vendor/league/uri/UriTemplate/Expression.php
vendor/league/uri/UriTemplate/TemplateCanNotBeExpanded.php
vendor/league/uri/UriTemplate/Operator.php
vendor/league/uri/UriTemplate/VarSpecifier.php
vendor/league/uri/UriTemplate/Template.php
vendor/league/uri/UriTemplate/VariableBag.php
vendor/psr/http-message/src/ServerRequestInterface.php
vendor/psr/http-message/src/UriInterface.php
vendor/psr/http-message/src/StreamInterface.php
vendor/psr/http-message/src/UploadedFileInterface.php
vendor/psr/http-message/src/RequestInterface.php
vendor/psr/http-message/src/ResponseInterface.php
vendor/psr/http-message/src/MessageInterface.php
vendor/psr/http-factory/src/ResponseFactoryInterface.php
vendor/psr/http-factory/src/StreamFactoryInterface.php
vendor/psr/http-factory/src/UploadedFileFactoryInterface.php
vendor/psr/http-factory/src/UriFactoryInterface.php
vendor/psr/http-factory/src/ServerRequestFactoryInterface.php
vendor/psr/http-factory/src/RequestFactoryInterface.php

Nyholm avatar May 29 '24 07:05 Nyholm

@nicolas-grekas I understand your hesitation about PSR-7 for the record I never used the PSR-7 aspect of the library. And the library predate PSR-7 UriInterface and UriFactoryInterface (which are in fact the only 2 interfaces used, since they are related to URI).

I only added PSR-7 so that the library could provide better interop with the PHP community (which is an underlying goal of the library). So if that was the one show stopper I believe then there's a case of moving that require dependency to an suggestion. But then again this would require a discussion before acting on that. As I stated

In other words, there are precedent for mutual participation/coordination in improving league/uri and Symfony.

So even opening a ticket to discuss this matter on the package would have IMHO clarify things or given a better context I supposed in your own decision making process.

But then again I want to completely stress out that I totally inderstand and respect your position/decision but I wanted to clarify this point.

nyamsprod avatar May 29 '24 07:05 nyamsprod

we already use the library as a core dependency

noted, i tend to agree we should try to avoid using 2 URI libraries in core

perhaps we should invert the discussion a bit;

will symfony/uri be useful to league/uri? respectively being low-level and high-level URI libraries.

either way, i also tend to believe once symfony/uri is a thing, PSR-7 support will be requested sooner or later. which sounds fair.

ro0NL avatar May 29 '24 07:05 ro0NL

@alexandre-daubois @nicolas-grekas @Nyholm while reading your answers I believe again that there is a misundestanding of what the league URI packages are and can do (I really need to update the documentation website because I thought it was obvious). So let me again clarify the package intent and I will leave it up to you to decide.

First, thanks to @wouterj who brought me into the conversation and who did demonstrate how the packages could resolve your issue but in the same time I understand other opinions and this got me thinking but wait what you really need is the league/uri-interfaces package and nothing else. That package contains:

  • a userland URI parser/builder
  • a userland Query parser/builder
  • and other tools to help managing URI like an IDN and IPv4 host converters (which depends on the symfony intl polyfill when ext-intl is missing)

The package was designed for providing low level utility tools you can built upon more advance feature like the uri and uri-component packages. It lacks opinionated structures around uri and some validation rules that are not suited for parsers. Granted, the package currently requires the PSR-7 interfaces but like I said the requirement is there for practical reason to allow interop but no PSR-7 centric feature is used in the package. So it is clear to me that it can be move to composer suggest section without breaking anything in the code.

That way you get the benefit of battle tested solutions while still exposing your own public API for your new component without the hurdle of PSR-7 or 60+ classes. You would for instance still have to validate your URI string against your scheme validation rules but that's already expected when using parse_url.

nyamsprod avatar May 29 '24 09:05 nyamsprod

I looked at the uri-interfaces package, and unfortunately after playing around a bit with it, I have the impression that it does not meet the primary need behind this component: the decoding of the username and password of a URI. Neither UriString::parse() and UriString::parseAuthority() seem to do it.

Indeed, it is this problem which triggered the discussions around this component (among other things). The fact that the username and password of the authority are not decoded automatically (which is understandable) forces us to do it case by case in the code, which brings a lot of redundancy (and is a source of bugs in the codebase as we have had in the past through the different DSN parsings). The league/uri-components package does this auto-decoding when creating a UserInfo, but that means we would need again two external packages everywhere we want to parse a DSN which seems heavy.

I think we can see this proposal as the ramsey/uuid and symfony/uid packages. They serve the same purpose, but the approaches are different. One is more β€œcomplete” and offers a lot of customization on many points, while the other responds to simpler cases and offers a simpler contract for that.

alexandre-daubois avatar May 29 '24 11:05 alexandre-daubois

here's a reimplementation of the UserInfo class using only league-uriInterface


use League\Uri;

final readonly class UserInfo implements Stringable
{
    private function __construct(public ?string $username = null, public ?string $password = null)
    {
        if (null === $this->username && null !== $this->password) {
            throw new \ValueError('It is not possible to associated a password to an undefined user.');
        }
    }

    public static function fromString(string|Stringable|null $userInfo): self
    {
        if ($userInfo === null) {
            return new self();
        }

        [$user, $password] = explode(':', $userInfo, 2) + [1 => null];

        return new self(
            Uri\Encoder::decodeAll($user),
            Uri\Encoder::decodeAll($password),
        );
    }

    public static function fromUri(string|Stringable $uri): self
    {
        $components = Uri\UriString::parse($uri);
        $userInfo = null;
        if ($components['user'] !== null) {
            $userInfo = $components['user'];
        }

        if ($components['pass'] !== null) {
            $userInfo .= ':'. $components['pass'];
        }

        return self::fromString($userInfo);
    }

    public function __toString(): string
    {
        return match (true) {
            null === $this->username => '',
            null === $this->password => Uri\Encoder::encodeUser($this->username),
            default => Uri\Encoder::encodeUser($this->username).':'.Uri\Encoder::encodePassword($this->password),
        };
    }
}

$userInfo = UserInfo::fromUri('redis://abc:b%C3%A9b%C3%A9@localhost?timeout=5');
dump($userInfo->username); // abc
dump($userInfo->password); // bΓ©bΓ©

As I said all the tools are present in the package .... πŸ˜‰

nyamsprod avatar May 29 '24 11:05 nyamsprod

I understand that powerful utils are present in league/uri-interfaces, but if we have to reimplement a class in this way, I'm not sure why we would take an existing package? Indeed and from what I understand from this discussion, the goal is to decide whether we implement a new solution (with 0 dependency if possible), or to use an existing one that suits us as-is and fits the need out of the box.

Also with this solution, we then would have 2 parsing happening every time: one for the user/pass, one for the rest of DSN. This doesn't seem optimal πŸ˜•

alexandre-daubois avatar May 29 '24 11:05 alexandre-daubois

@alexandre-daubois as I always say the choice and the final call are yours not mine I was a bit surprise by your last comment saying it was not possible to decode components with the package hence the concrete example.

Having said that, here's a simple example of what I mean. league/(uri and uri-component) do not use mutable classes except for UrlSearchParams for obvious reasons. The Symfony package may/could very well use mutable classes, that's an architectural change that is valid that could indeed rules out the use of both packages. Well from my POV, that does not disqualify uri-interfaces since the package does not care about that it is a very low level package (And that's why I can confidently say it can live without PSR-7).

Again all I am saying is that the building blocks are there to be used. You can choose to use them or not that's fine as long as you know the full context of your decision.

nyamsprod avatar May 29 '24 12:05 nyamsprod

@nyamsprod do I understand correctly that you suggest maybe using league/uri-interfaces and build our own high-level Uri interface, instead of using the league/uri interface?

That sounds like an interesting idea to me! That way, we centralize the low level logic in one place (the league package), but Symfony is still able to build the interface in whatever way they want.

wouterj avatar May 30 '24 10:05 wouterj

@wouterj yes that was my suggestion indeed. I have always seen/use uri-interfaces a bit like ReactPHP and AmPHP see/use Revolt. it's a tooling kit that can be use by anyone to simplify dealing with URI without getting in the way of their different implementations.

nyamsprod avatar May 30 '24 11:05 nyamsprod

I'm sorry, I have a little trouble seeing the point of proposing part of an implementation that requires transitive dependencies where a relatively concise solution offers the same result with 0 dependency?

The classes are not that large in this PR. I admit that I have difficulty being convinced it worth it the addition of an external dependency - even greatly maintained like league's packages - for not that much logic to share, namely some string parsing. Could I be missing something though? Is there anything else that will make things a lot simpler that I don't see?

alexandre-daubois avatar May 30 '24 11:05 alexandre-daubois

As a League and FIG contributor, my suggestion can be found here. This is purely my personal opinion on the matter, as a daily PHP user who happens to have been involved with some of the projects being discussed.

shadowhand avatar Jun 03 '24 14:06 shadowhand

Thanks for adding your point of view to this discussion, Woody! This is an interesting point of view for sure, I especially agree with you on "the current repository split is confusing"

alexandre-daubois avatar Jun 03 '24 17:06 alexandre-daubois

I certainly do not want to have to require PSR-7 and its myriad of other deps

PSR-7 is a very simple package without dependencies. What am I missing here?

This PR suggest to introduce 6 new classes. We cannot compare that to the 75 classes added if you do composer require league/uri-components

Are we really counting classes as arguments here? What's the major gain if we have 69 classes less? Isn't maintenance of the classes much more important than a few files and bytes?

Another good reason to use league/uri instead imho would be that we give the users a Uri class that implements RFC-3986 and the URL Living Standard, rather than a Uri class which is scoped specifically for internal usage of Symfony's DSN env vars. The latter can cause for issues when users decide to start using the Uri class for their own non-DSN use-cases.

This is exactly why we (amphp) stopped maintaining our own package (https://github.com/amphp/uri), which I still need to set to deprecated status (oops). It started to parse connect URIs like tcp://localhost:123, but it grew and we didn't want to maintain living spec compliance for it, so we switched to the league packages long ago. I don't think shipping a limited implementation just for DSN parsing makes sense, at least not with that generic name. People will expect it to be spec compliant and it'll either eventually grow or people will continue to run into the same problems.

I don't want to get too involved in this discussion because I don't have a stake in it, but I've been mentioned in https://github.com/thephpleague/uri-src/issues/137, so I wanted to provide my current thoughts.

kelunik avatar Jun 03 '24 19:06 kelunik