pyWhat icon indicating copy to clipboard operation
pyWhat copied to clipboard

URL regex does not fully match every URL

Open ghost opened this issue 4 years ago • 5 comments

For example: image

I would like it to return both google.co and google.com. Sadly, it may be impossible considering the way regular expressions work. Thus, it would be amazing to match the longest string (pywhat google.com/help should return google.com/help). It is crucial for implementing URL subcategories properly(#51). Btw, URL regex is too long, I do not think that valid TLDs should be checked, so it may be shortened.

ghost avatar Jul 16 '21 07:07 ghost

I do not think that valid TLDs should be checked

The reason we do this is because of false positives by the way :)

bee-san avatar Jul 16 '21 10:07 bee-san

sad

ghost avatar Jul 16 '21 13:07 ghost

Reopening this since pywhat does not match something like https://www.google.com fully.

ghost avatar Jul 21 '21 14:07 ghost

Hey, @amadejpapez, do you have any ideas about this one?

ghost avatar Jul 24 '21 07:07 ghost

Hey, @amadejpapez, do you have any ideas about this one?

Hm will check this later today and see if I get any.

amadejpapez avatar Jul 24 '21 08:07 amadejpapez