emoji-java
emoji-java copied to clipboard
refactorings + backwards-compatible arbitrary emoji processing API
Hello!
Thanks for writing this library, it's definitely saved me a bit of manual labor! I've made some changes, and hopefully you and others might find them handy.
This PR:
- Typo fix: rename
Emoji::getHtmlHexidecimaltoEmoji::getHtmlHexadecimal, but leaves in the old one with a deprecation warning. Hexadecimal is used everywhere else in the code. - Creates interfaces
UnicodeProcessorandAliasProcessor, which are used byEmojiParser::processUnicodeandEmojiParser::processAliases. These two new methods are generalizations ofEmojiParser::parseTo*** - Converts the existing
EmojiParser::parseTo***to use the newEmojiParser::process***API. These conversions are placed inUnicodeParsersandAliasParsers, both singleton classes. - Extracts
AliasCandidateandUnicodeCandidateto public classes, instead of inner protected classes ofEmojiParseras the processor types depend on these.
In the case of AliasParser, I avoided using Java 8 functional interfaces to allow full backwards-compatibility with Java <8.
Tests are green across the board, and these changes won't (or at least shouldn't) break any existing code that uses this library. If you'd like, I can also add a blurb to the README describing the new processing API.
Coverage decreased (-1.0%) to 93.312% when pulling ecc17f322b2658575e043d3fd97221a7fa0cd578 on iostat:master into c245c8dfb186ec34f373d1966d697a720151c8b2 on vdurmont:master.
Hey @iostat !
Thanks for this contribution! I'm sorry I took so long to answer. Could you give us some context on your goal?
So, ultimately the goal was to abstract out the parsing process in such a way that I can use emoji-java's parsing engine but control what kind of information I get back. As an example, here's some scala code I use to convert emoji to XML for further downstream processing:
import com.vdurmont.emoji.EmojiParser
import com.vdurmont.emoji.EmojiParser.FitzpatrickAction
import com.vdurmont.emoji.parsers.{UnicodeCandidate, UnicodeProcessor}
import scala.collection.JavaConversions._
object DeEmojifier extends UnicodeProcessor {
override def shouldRemoveFitzpatrick(fitzpatrickAction: FitzpatrickAction): Boolean = false
override def apply(input: UnicodeCandidate, fitzpatrickAction: FitzpatrickAction): String =
<emoji
tags={input.emoji.getTags.mkString(",")}
description={input.emoji.getDescription}
modifiers={Option(input.fitzpatrick).map(_.toString.toLowerCase).getOrElse("")}
entity={input.emoji.getHtmlHexadecimal.replaceAll("[^0-9A-Fa-fXx]", "")}
/>.toString.replaceAll("\\s{2,}", " ").trim // trim out the extra spaces as a result of line breaks in the code #justscalaxmlthingz
}
def deEmojify(corpus: String): String = {
EmojiParser.processUnicode(corpus, DeEmojifier, FitzpatrickAction.PARSE)
}
Without these changes, I'd have to parse the string myself, find the emoji, do the substitutions, load in my own descriptions, etc., whereas with this new API I am able to just tell emoji-java, "hey just find emoji and do this with them please`. The rest of the changes are just converting the existing code to this new API for consistency. And of course, the one typo fix, probably because I'm too OCD for my own good :P