Tusky
Tusky copied to clipboard
HTML parsing bugs
Current problems with our HTML parsing/rendering:
- whitespace is not preserved #786
- links are not shortened #655 (yes, the info how to shorten links is in the html, we just don't parse it correctly)
- some elements are rendered really ugly #1271
- It is slow. We use Androids built-in HTML parser, but then change the result in many ways. Most of the changes are done when binding ViewHolders, which is inefficient because the work is done multiple times instead of once when we receive the server response.
Already fixed HTML parsing/rendering issues that need to be considered:
- remove trailing whitespace https://github.com/tuskyapp/Tusky/blob/master/app/src/main/java/com/keylesspalace/tusky/util/HtmlUtils.java#L40
- Add zero-width space after links in end of line to fix its too large hitbox https://github.com/tuskyapp/Tusky/blob/master/app/src/main/java/com/keylesspalace/tusky/util/LinkHelper.java#L120
- not underlining links
(there is probably more for both categories)
My proposal would be to write a custom version of the Android HTML parser (input: String with HTML markup, output Spanned that renders correctly in a Android TextView) to solve these issues. Looking at the source, this is a challenge, but it is definitely doable.
I investigated this a bit, looks like it won't work as nicely as I imagined, but we still need to work around those bugs.