valid-url icon indicating copy to clipboard operation
valid-url copied to clipboard

is_iri should allow UTF-8 characters

Open gu-stav opened this issue 5 years ago • 1 comments

Currently is_iri fails on any characters, which don't match this regex:

https://github.com/ogt/valid-url/blob/8d1fc52b21ceab99b68f415838035859b7237949/index.js#L28

Example:

t.ok(is_uri('http://localhost/ä'), 'http://localhost/ä');

As far as I understand the main difference between URI and IRI is:

IRIs extend URIs by using the Universal Character Set, where URIs were limited to ASCII, with far fewer characters.

https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier

Therefore is_iri should allow all UTF-8 characters and these exports aren't correct:

module.exports.is_uri = is_iri;
module.exports.isUri = is_iri;

Instead there should be two exports: is_iri and is_uri.

I saw there wasn't any commit to this repo since 2015, but since this affects a Gatsby issue I was wondering, what you plans with the module are? Do you think about handing it over, @ogt?

The module is used heavily according to npms download numbers, so it might be in the interest of the community to give it some 💌 .

gu-stav avatar Mar 21 '19 10:03 gu-stav

Hi @gustavpursche,

I was facing the same issues with valid-url but also validator so I decided to build a module as reliable as possible strictly based on RFC-3986: https://github.com/adrienv1520/node-uri

The main features of this project are:

  • parse any URI (URNs, URLs, URIs with IDNs support, etc.);
  • get the safe Punycode ASCII or Unicode serialization of a domain;
  • check an URI, HTTP/HTTPS/Sitemap URL, IP, domain is valid with clear checking errors;
  • encode/decode an URI, HTTP/HTTPS/Sitemap URL.

I hope it could help you.

adrienv1520 avatar Oct 17 '20 16:10 adrienv1520