valibot icon indicating copy to clipboard operation
valibot copied to clipboard

Feat: add domain() validation action (ASCII domains)

Open yslpn opened this issue 3 months ago • 6 comments

Add domain() validation action (ASCII domains) + docs

Closes https://github.com/fabian-hiller/valibot/issues/1277

Summary

  • Introduces domain() validation action for ASCII domain names
  • Adds API docs (action + types)

Rationale

  • Uses battle-tested regex from Zod (Docs / Github) ) instead of a custom one
/^([a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$/;

UPD: Zod regex with small updates https://github.com/fabian-hiller/valibot/pull/1284#issuecomment-3218342138

Decisions

  • localhost and bare intranet-like hostnames are considered invalid for domain() (by design)
  • Regex focuses on label rules and TLD; we do not enforce total domain length (max 253 chars)

Limitations

  • Total domain length is not checked
  • Unicode or IDN/Punycode is not supported (ASCII TLD required)

Discuss

We can switch to a regex that supports Punycode TLD (and total length) if desired

/^(?=.{1,253}$)(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+(?:[a-z]{2,}|xn--[a-z0-9-]{2,})$/i

The advantage of this solution is that we can add toPunycode() and use it before domain() for unicode domains.

If we go with a solution that only checks simple cases, we will face the need to add a separate rfcDomain action, as it happens now with email + rfcEmail.

What do you think about this?

yslpn avatar Aug 23 '25 12:08 yslpn

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
valibot Ready Ready Preview Comment Aug 25, 2025 7:57am

vercel[bot] avatar Aug 23 '25 12:08 vercel[bot]

Open in StackBlitz

npm i https://pkg.pr.new/valibot@1284

commit: 1674aab

pkg-pr-new[bot] avatar Aug 24 '25 12:08 pkg-pr-new[bot]

Thank you for this PR! Do "normal" domains only support a single - but Punycode TLDs require two? Are there any other differences?

fabian-hiller avatar Aug 24 '25 12:08 fabian-hiller

Do "normal" domains only support a single - but Punycode TLDs require two? Are there any other differences?

  1. Top level domains can contain numbers for punycode, but not for regular domains;
  2. Labels and TLDs start with xn--;
  3. Normal domains can support double - for label. The main thing is that it is not at the beginning or end of the label.
  4. Regular TLDs do not contain -.

The difficulty is that the specification does not have regular expressions. And in the literature, for example, https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch08s15.html, fairly primitive regular expressions are presented.

Determining the correct domain is more about procedures, not about patterns. We will not be able to achieve perfect validation, but we can somehow get closer to it.

and just for information, the source of truth for all TLDs: https://data.iana.org/TLD/tlds-alpha-by-domain.txt

yslpn avatar Aug 24 '25 12:08 yslpn

Would your recommendation be to introduce domain for "normal" domains and add rfcDomain later on as needed like we did with email and rfcEmail?

fabian-hiller avatar Aug 24 '25 14:08 fabian-hiller

Would your recommendation be to introduce domain for "normal" domains and add rfcDomain later on as needed like we did with email and rfcEmail?

I think we can go ahead with only ASCII checking, which we call "regular" or "normal" domain. Then we can add rfcDomain if someone needs it.

But before that, I think we should make the current zod regular expression a bit stricter:

  1. Add the i flag, which will make the regular expression shorter a-zA-Z => a-z
  2. Add a limit on the TLD length [a-z]{2,}$ => [a-z]{2,63}$, which is standard
  3. Limit the total string length (?=.{1,253}$), which is standard

Seems pretty safe.

The final result:

/^(?=.{1,253}$)([a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z]{2,63}$/i;

After that, I will add new tests and the documentation description.

UPD: Done

yslpn avatar Aug 24 '25 19:08 yslpn