Recognize original URLs from translation services
Flattr should recognize translator services and be able to pull out the original URL and credit the original URL. About 4 % of traffic to my own humble English-language corner of the web come in through these translation proxy services.
Baidu:
- http://fanyi.baidu.com/transpage?query=https%3A%2F%2Fwww.example.com%2F&source=url&ie=utf8&from=it&to=zh&render=1
Bing Translator and Microsoft Translate:
- https://www.translatetheweb.com/?from=&to=en&a=https://www.example.com/
- https://www.microsofttranslator.com/bv.aspx?from=&to=en&a=https://www.example.com/
Google Translate:
- https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&ie=UTF-8&u=https%3A%2F%2Fwww.example.com&edit-text=&act=url
- https://translate.googleusercontent.com/translate_c?act=url&depth=1&hl=en&ie=UTF8&prev=_t&rurl=translate.google.com&sl=auto&sp=nmt4&tl=en&u=https://www.example.com/&xid={redacted}&usg={redacted}
Yanex Translate:
- https://translate.yandex.ru/translate?url=https%3A%2F%2Fwww.example.com%2F&lang=en-ru
- https://translate.yandex.com/translate?url=https%3A%2F%2Fwww.example.com%2F&lang=ru-en
Not sure if this should be performed in the extension or server-side, though.
Hardcoding such services would likely not be a good approach so I'll check whether there's meta data on those pages that points to the canonical page.
In general please be aware that the extension doesn't support web applications at this point as they have a different usage pattern than news sites, blogs, videos or other content that can be consumed. Therefore this is an interesting edge case where it's technically a web application but it's purpose is to show you consumable content.
Oh, these aren't web applications either. They're content-translating HTTP proxies.
I already looked for useful metadata but there is literally just the URL patterns to work with here. All of these are from market leading search engine providers that display links (in the above formats) to translation next to foreign-language search results.
It'd be interesting to know how search engines treat such translated pages. Since there's no metadata on those they may just ignore and not include them in search results.
On another note I quickly wanted to mention that this issue has been passed on to the server team as they want to look into this use-case on their end.
It'd be interesting to know how search engines treat such translated pages. Since there's no metadata on those they may just ignore and not include them in search results.
All of the example links are actually included on search result pages. You’ll see them as “Translate page” links next to the main foreign-language search result on Bing, Baidu, Yandex, and Google search. Try searching for something in a foreign language on English language Bing or Google, for example.
On another note I quickly wanted to mention that this issue has been passed on to the server team as they want to look into this use-case on their end.
Thanks!