twitter-text icon indicating copy to clipboard operation
twitter-text copied to clipboard

google.lv is not extracted while google.com is

Open kgusarov opened this issue 8 years ago • 1 comments

Here's my test case:

String text = "\nhttp://www.lursoft.lv/address/riga-terbatas-iela-73-lv-1001" +
            "\ngoogle.com" +
            "\ngoogle.lv" +
            "\nwhatever.lv" +
            "\nwhatever.lt" +
            "\n$also $some $cash";

...
assertThat(urls, containsInAnyOrder("http://www.lursoft.lv/address/riga-terbatas-iela-73-lv-1001",
                "google.lv", "google.com", "whatever.lv", "whatever.lt"));

kgusarov avatar Nov 25 '16 11:11 kgusarov

https://github.com/twitter/twitter-text/blob/cebd98612738011d8b65d4c22650d56a0bcda669/conformance/TldLists.java#L1420

The difference here is that .com is considered a GTLD, whereas .lv is a CTLD and they have slightly different rules. Right now .lv will require either a leading protocol or trailing path to be recognised as a valid url. I've started working on this project recently and am not aware of the logic behind this. I'll try to get answers or a reevaluation.

codemonkey3045 avatar Dec 08 '16 06:12 codemonkey3045