lockbox-extension
lockbox-extension copied to clipboard
Implement URL parser
- Based on https://gist.github.com/linuxwolf/5722b60907826b47a97b10cf36676ca2
- Use Mozilla public suffix list: https://publicsuffix.org/
This needs to come before 'fill'.
Structurally, my plan here is to implement a matchURL() function that takes a query and an entry, and returns a bool indicating whether it matched. Then we can apply that function over the list of items to determine the matched items. This way, a given query can be mapped to multiple alternatives (e.g. wwww.facebook.com and facebook.com) as well as doing "fancy" things with checking port numbers and such.
One possible enhancement would be for matchURL() to return an integer score ranking the match. That way, better matches are higher-ranked and thus listed earlier in the results. That's something I think we could delay for a while, since it should only be noticeable in certain edge cases.
@jimporter some questions/comments:
- What is the query? Is this the full URL as retrieved from the active tab (for instance), or is it something shorter?
- While a stretch goal, what are the edge cases you see would be mitigated by a scoring system?
- We need to adhere to origin as much as possible, which includes scheme and port. Not doing so can (and has) led to vulnerabilities in other systems.
@linuxwolf:
-
The query would be everything before the path in the URL (minus HTTP basic auth username/password, of course), so basically
URL.protocol + "//" + URL.host; that will give us protocol, hostname, and port (with port being implicitly-defined if it's the protocol's default). That should be sufficient for any searches we want to do, while also looking readable and being easy enough for a user to type on their own.- Our
matchURL()function would accept a wider variety of queries (i.e. user-provided ones), but would effectively transform them into a query of the above format before doing anything. For example, if a user enterswww.facebook.com, it will be as if they enteredhttps://www.facebook.com(which in turn is equivalent tohttps://www.facebook.com:443).
- Our
-
Scoring could help if you have accounts on subdomains that you want to be higher-priority than an account on the root domain. I can't think of many real-world sites that do this, but it may help for some intranet sites.
-
As mentioned a bit in (1), the goal is to provide an easy way to specify the origin we're looking for without losing any of the information we want. Then
matchURL()can strictly match the origin. In general, matching the URL is different than matching any of the other fields in an item, since we want to perform an exact match on the URL, whereas we'd be ok with something like substring-match for site name, username, etc.matchURL(), well, matches the interface for the function we already have to do this:filterItem(). In practice, I expect that we'd have some generic functions that let us do a match on a particular field of an item, and then we could just compose those together to get the finalfilterItem()function.
- As mentioned a bit in (1), the goal is to provide an easy way to specify the origin we're looking for without losing any of the information we want. Then matchURL() can strictly match the origin. In general, matching the URL is different than matching any of the other fields in an item, since we want to perform an exact match on the URL, whereas we'd be ok with something like substring-match for site name, username, etc.
Matching URLs is different, but more complex than an exact match. What wasn't specified in this gist is that both sides of the comparison are parsed and reassembled: the entry's origin(s) and the query argument. It does get complex, but I believe it does a better job of meeting user expectations given available information and providing reasonable guardrails (e.g., secure origins only).