url
url copied to clipboard
Swiss Army knife for urls.
Swiss Army knife for URLs
This package is for you when PHP's parse_url() is not enough.
Key Features:
- Parse a URL and access or modify all its components separately.
- Resolve any relative reference you may find in an HTML document to an absolute URL, based on the document's URL.
- Get not only the full host of a URL, but also the registrable domain, the domain suffix and the subdomain parts of the host separately (Thanks to the Mozilla Public Suffix List).
- Compare URLs or components of URLs (e.g. checking if different URLs point to the same host or domain)
- Thanks to symfony/polyfill-intl-idn it's also no problem to parse internationalized domain names (IDN).
- Includes an adapter class which implements the PSR-7 UriInterface.
Requirements
Requires PHP version 7.2 or above.
Installation
Install the latest version with:
composer require crwlr/url
Usage
Including the package
<?php
include('vendor/autoload.php');
use Crwlr\Url\Url;
To start using the library include composer's autoload file and import the
Url
class so you don't have to write the full namespace path again and again.
Further code examples skip the above.
Parsing URLs
Parsing a URL is easy as pie:
$url = Url::parse('https://john:[email protected]:8080/foo?bar=baz');
The static parse
method of the Url
class provides a convenient way to
create a new instance and then access all of it's components separately.
// Accessing URL components via method calls
$port = $url->port(); // => 8080
$domainSuffix = $url->domainSuffix(); // => "com"
$path = $url->path(); // => "/foo"
$fragment = $url->fragment(); // => NULL
// Or as properties
$scheme = $url->scheme; // => "https"
$user = $url->user; // => "john"
$host = $url->host; // => "www.example.com"
$domain = $url->domain; // => "example.com"
Of course you can also get a new instance using the new
keyword.
$url = new Url('https://www.steve.jobs/');
Relative URLs
New in v1.0 of this package is, that you can obtain an instance of Url
from
a relative URL as well. Previous versions throw an InvalidUrlException
when
the URL string doesn't contain a valid scheme component.
$url = Url::parse('/some/path?query=string');
var_dump($url->__toString()); // => '/some/path?query=string'
var_dump($url->scheme()); // => null
var_dump($url->path()); // => '/some/path'
Available URL components
Below, you can see a visualization of all the components that are available to
you via a Url
object.
https://john:[email protected]:8080/foo?bar=baz#anchor
domainLabel domainSuffix
↓ ↓
_____ ____ ___ _____________________ ____ ____ _______ ______
|https||john|123|subdomain.example.com|8080|/foo|bar=baz|anchor|
‾‾‾‾‾ ‾‾‾‾ ‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾ ‾‾‾‾ ‾‾‾‾‾‾‾ ‾‾‾‾‾‾
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
scheme user ↑ subdomain domain port path query fragment
↑ ⤷ host ⤶
| pass(word) |
|___________________________________|
|john:[email protected]:8080|
|‾‾‾‾‾‾‾‾|‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
|________| ↑
|john:123| authority
‾‾‾‾‾‾‾‾
↑
userInfo
When a component is not present in a URL (e.g. it doesn't contain user and
password) the corresponding properties will return NULL
.
Further available component combinations
The following combinations of components aren't really common, but may as well be useful sometimes.
_______________________ ___________________
|https://www.example.com| |/foo?bar=baz#anchor|
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
↑ ↑
root relative
root
The root
as it's called here, consists of the scheme and the authority
components.
$url = Url::parse('https://www.example.com:8080/foo?bar=baz');
$root = $url->root(); // => "https://www.example.com:8080"
relative
Complementary to root
you can retrieve path, query and fragment via the
relative
method.
$url = Url::parse('https://www.example.com/foo?bar=baz#anchor');
$relative = $url->relative(); // => "/foo?bar=baz#anchor"
Parsing a query string
If you're after the query of a URL you may want to get it as an array. Don't worry, nothing easier than that:
$url = Url::parse('https://www.example.com/foo?bar=baz&key=value');
var_dump($url->queryArray());
Output
array(2) {
["bar"]=>
string(3) "baz"
["key"]=>
string(5) "value"
}
If you're already on PHP 8 you can additionally install the
crwlr/query-string
package for a more advanced API to access and manipulate
the query string.
composer require crwlr/query-string
After the package is installed the queryString()
method returns an instance
of the Query
class from that package. You can find the docs for this
class here.
A quick example:
$url = Url::parse('https://www.example.com/listing?page[number]=3&page[size]=25');
$url->queryString()
->get('page')
->set('number', '4');
var_dump($url->__toString());
// string(68) "https://www.example.com/listing?page%5Bnumber%5D=4&page%5Bsize%5D=25"
Modifying URLs
All methods that are used to get a component's value can also be used to
replace or set its value. So for example if you have an array of URLs and you
want to be sure that they are all on https, you can achieve this simply by
setting the scheme to https
for all of them in a loop.
$urls = [
'https://www.example.com',
'http://notsecure.example.org/foo',
'https://secure.example.org/bar',
'http://www.example.com/baz'
];
foreach ($urls as $key => $url) {
$urls[$key] = Url::parse($url)->scheme('https')->toString();
}
var_dump($urls);
Output
array(4) {
[0]=>
string(24) "https://www.example.com/"
[1]=>
string(33) "https://notsecure.example.org/foo"
[2]=>
string(30) "https://secure.example.org/bar"
[3]=>
string(27) "https://www.example.com/baz"
}
Another example: your website can be reached with or without the www subdomain. Sloppy input data can easily be fixed by just assigning the same host to all of them.
$urls = [
'https://www.example.com/stuff',
'https://example.com/yolo',
'https://example.com/products',
'https://www.example.com/contact',
];
$urls = array_map(function($url) {
return Url::parse($url)->host('www.example.com')->toString();
}, $urls);
var_dump($urls);
Output
array(4) {
[0]=>
string(29) "https://www.example.com/stuff"
[1]=>
string(28) "https://www.example.com/yolo"
[2]=>
string(32) "https://www.example.com/products"
[3]=>
string(31) "https://www.example.com/contact"
}
And that's the same for all components that are listed under the available URL components.
And the query can even be set as an array:
$url = Url::parse('https://www.example.com/foo');
$url->queryArray(['param' => 'value', 'marco' => 'polo']);
echo $url;
Output
https://www.example.com/foo?param=value&marco=polo
Btw.: As you can see in the example above, you can use a Url
object like
a string because of its __toString()
method.
Resolving relative URLs
When you scrape URLs from a website you will come across relative URLs like
/path/to/page
, ../path/to/page
, ?param=value
, #anchor
and alike. This
package makes it a breeze to resolve these URLs to absolute ones with the URL
of the page where they have been found on.
$documentUrl = Url::parse('https://www.example.com/foo/bar/baz');
$relativeLinks = [
'/path/to/page',
'../path/to/page',
'?param=value',
'#anchor'
];
$absoluteLinks = array_map(function($relativeLink) use ($documentUrl) {
return $documentUrl->resolve($relativeLink)->toString();
}, $relativeLinks);
var_dump($absoluteLinks);
Output
array(4) {
[0]=>
string(36) "https://www.example.com/path/to/page"
[1]=>
string(40) "https://www.example.com/foo/path/to/page"
[2]=>
string(47) "https://www.example.com/foo/bar/baz?param=value"
[3]=>
string(42) "https://www.example.com/foo/bar/baz#anchor"
}
If you pass an absolute URL to resolve()
it will just return that absolute
URL.
Comparing URLs or URL components
You may think to compare two URLs you don't need this library, but whenever you have URLs from an unpredictable input source, I'd recommend to use it. Imagine a case like this:
$url1 = 'https://www.example.com/foo/bár/báz';
$url2 = 'https://www.example.com/foo/b%C3%A1r/b%C3%A1z';
var_dump($url1 === $url2);
// Returns false. Of course the two strings aren't equal.
var_dump(Url::parse($url1)->isEqualTo($url2));
// Returns true, because the path /foo/bár/báz is percent-encoded
// in the `Url` class to /foo/b%C3%A1r/b%C3%A1z
It's also really easy to compare the same component of two different URLs.
$url1 = Url::parse('https://u:[email protected]/foo?q=s#frag');
$url2 = Url::parse('http://s:[email protected]/bar?u=t#ment');
$url3 = clone $url1;
$url1->isSchemeEqualIn($url2); // false
$url1->isSchemeEqualIn($url3); // true
$url1->isHostEqualIn($url2); // false
$url1->isHostEqualIn($url3); // true
$url1->isPasswordEqualIn($url2); // false
$url1->isPasswordEqualIn($url3); // true
These are all available comparison methods:
- isEqualTo($url)
- isComponentEqualIn($url, $componentName)
- isSchemeEqualIn($url)
- isAuthorityEqualIn($url)
- isUserEqualIn($url)
- isPasswordEqualIn($url)
- isUserInfoEqualIn($url)
- isHostEqualIn($url)
- isDomainEqualIn($url)
- isDomainLabelEqualIn($url)
- isDomainSuffixEqualIn($url)
- isSubdomainEqualIn($url)
- isPortEqualIn($url)
- isPathEqualIn($url)
- isQueryEqualIn($url)
- isFragmentEqualIn($url)
Internationalized domain names (IDN)
echo Url::parse('https://www.пример.онлайн/hello/world')->toString();
Output
https://www.xn--e1afmkfd.xn--80asehdb/hello/world
Behind the curtains symfony/polyfill-intl-idn is used, so you don't need to have the internationalization PHP extension installed to parse internationalized domain names.
To check if a URL contains an internationalized domain name you can use the
hasIdn
method:
Url::parse('https://www.example.com')->hasIdn(); // => false
Url::parse('https://www.müller.de')->hasIdn(); // => true
Url::parse('https://www.xn--m1adged4c3a.com')->hasIdn(); // => true
PSR-7 UriInterface adapter class
The Url
class does not support immutability as it is required by the
PSR-7 UriInterface.
But the package provides an adapter class Crwlr\Url\Psr\Uri
which has an
instance of the Url
class in a private property and thus assures immutability.
Usage Example
$url = 'https://user:[email protected]:1234/foo/bar?some=query#fragment';
$uri = Url::parsePsr7($url); // Or instead: new Crwlr\Url\Psr\Uri($url);
var_dump($uri->getScheme()); // => 'https'
var_dump($uri->getAuthority()); // => 'user:[email protected]:1234'
var_dump($uri->getUserInfo()); // => 'user:password'
var_dump($uri->getHost()); // => 'www.example.com'
var_dump($uri->getPort()); // => 1234
var_dump($uri->getPath()); // => '/foo/bar'
var_dump($uri->getQuery()); // => 'some=query'
var_dump($uri->getFragment()); // => 'fragment'
// Keep in mind an instance of `Uri` is immutable and all the methods that change
// state (method names starting with "with") return a new instance:
$newUri = $uri->withScheme('http');
var_dump($uri->getScheme()); // => 'https'
var_dump($newUri->getScheme()); // => 'http'
$uri = $newUri->withUserInfo('u', 'p');
var_dump($uri->getUserInfo()); // => 'u:p'
$uri = $uri->withHost('foo.bar.com');
var_dump($uri->getHost()); // => 'foo.bar.com'
$uri = $uri->withPort(666);
var_dump($uri->getPort()); // => 666
$uri = $uri->withPath('/path');
var_dump($uri->getPath()); // => '/path'
$uri = $uri->withQuery('foo=bar');
var_dump($uri->getQuery()); // => 'foo=bar
$uri = $uri->withFragment('baz');
var_dump($uri->getFragment()); // => 'baz'
var_dump($uri->__toString());
// => 'http://u:[email protected]:666/path?foo=bar#baz'
Exceptions
There are two Exceptions that can be thrown by the Url
class:
-
InvalidUrlException
when you try to create an instance from a string that isn't a valid URI. -
InvalidUrlComponentException
when you try to set an invalid new value for a component (scheme, host,...).
When you're dealing with unpredictable input source, you should catch and handle them somehow.
Updating Mozilla's Public Suffix List
Mozilla's Public Suffix List is parsed and stored in a file in this package to be able to extract the domain suffix from a URL's host component. It should be updated with every new release of this package. If you need to get the latest version of the list immediately, because a particular new suffix isn't included in the list in this repository, you can update it using the following composer command:
composer update-suffixes
Note: Please don't overuse this, as Mozilla states on their page:
If you wish to make your app download an updated list periodically, please use this URL and have your app download the list no more than once per day. (The list usually changes a few times per week; more frequent downloading is pointless and hammers our servers.)