TLDExtract icon indicating copy to clipboard operation
TLDExtract copied to clipboard

TLDExtract not properly parsing hostname

Open leem32 opened this issue 6 years ago • 1 comments

I'm running some domain names through TLDExtract and came across a domain not being properly parsed.

The URL is called blogspot.com

$url = 'blogspot.com';
$domain = tld_extract($url);
var_dump($domain);

Returns: 
object(LayerShifter\TLDExtract\Result)[9]
  private 'subdomain' => null
  private 'hostname' => string 'blogspot.com' (length=12)
  private 'suffix' => null

Weirdly the URL 'flogspot.com' works fine and returns:

object(LayerShifter\TLDExtract\Result)[9]
  private 'subdomain' => null
  private 'hostname' => string 'flogspot' (length=8)
  private 'suffix' => string 'com' (length=3)

The URL logspot.com also works and returns:

object(LayerShifter\TLDExtract\Result)[9]
  private 'subdomain' => null
  private 'hostname' => string 'logspot' (length=7)
  private 'suffix' => string 'com' (length=3)

Any idea why the TLD in 'blogspot.com' is not being added to the suffix? Is this a bug?

leem32 avatar Oct 03 '19 20:10 leem32

I see blogspot.com is in the public_suffix_list.dat. What's going on here? Can't Layershifter parse any of the URL's in that list? Any workarounds?

https://github.com/publicsuffix/list/blob/6f2b9e75eaf65bb75da83677655a59110088ebc5/public_suffix_list.dat#L5884

leem32 avatar Oct 03 '19 20:10 leem32