emoji-java icon indicating copy to clipboard operation
emoji-java copied to clipboard

refactorings + backwards-compatible arbitrary emoji processing API

Open iostat opened this issue 9 years ago • 3 comments

Hello!

Thanks for writing this library, it's definitely saved me a bit of manual labor! I've made some changes, and hopefully you and others might find them handy.

This PR:

  • Typo fix: rename Emoji::getHtmlHexidecimal to Emoji::getHtmlHexadecimal, but leaves in the old one with a deprecation warning. Hexadecimal is used everywhere else in the code.
  • Creates interfaces UnicodeProcessor and AliasProcessor, which are used by EmojiParser::processUnicode and EmojiParser::processAliases. These two new methods are generalizations of EmojiParser::parseTo***
  • Converts the existing EmojiParser::parseTo*** to use the new EmojiParser::process*** API. These conversions are placed in UnicodeParsers and AliasParsers, both singleton classes.
  • Extracts AliasCandidate and UnicodeCandidate to public classes, instead of inner protected classes of EmojiParser as the processor types depend on these.

In the case of AliasParser, I avoided using Java 8 functional interfaces to allow full backwards-compatibility with Java <8.

Tests are green across the board, and these changes won't (or at least shouldn't) break any existing code that uses this library. If you'd like, I can also add a blurb to the README describing the new processing API.

iostat avatar Mar 25 '16 20:03 iostat

Coverage Status

Coverage decreased (-1.0%) to 93.312% when pulling ecc17f322b2658575e043d3fd97221a7fa0cd578 on iostat:master into c245c8dfb186ec34f373d1966d697a720151c8b2 on vdurmont:master.

coveralls avatar Mar 25 '16 20:03 coveralls

Hey @iostat !

Thanks for this contribution! I'm sorry I took so long to answer. Could you give us some context on your goal?

vdurmont avatar Apr 25 '16 01:04 vdurmont

So, ultimately the goal was to abstract out the parsing process in such a way that I can use emoji-java's parsing engine but control what kind of information I get back. As an example, here's some scala code I use to convert emoji to XML for further downstream processing:

import com.vdurmont.emoji.EmojiParser
import com.vdurmont.emoji.EmojiParser.FitzpatrickAction
import com.vdurmont.emoji.parsers.{UnicodeCandidate, UnicodeProcessor}

import scala.collection.JavaConversions._

object DeEmojifier extends UnicodeProcessor {
  override def shouldRemoveFitzpatrick(fitzpatrickAction: FitzpatrickAction): Boolean = false

  override def apply(input: UnicodeCandidate, fitzpatrickAction: FitzpatrickAction): String =
    <emoji
      tags={input.emoji.getTags.mkString(",")}
      description={input.emoji.getDescription}
      modifiers={Option(input.fitzpatrick).map(_.toString.toLowerCase).getOrElse("")}
      entity={input.emoji.getHtmlHexadecimal.replaceAll("[^0-9A-Fa-fXx]", "")}
    />.toString.replaceAll("\\s{2,}", " ").trim // trim out the extra spaces as a result of line breaks in the code #justscalaxmlthingz
}

def deEmojify(corpus: String): String = {
  EmojiParser.processUnicode(corpus, DeEmojifier, FitzpatrickAction.PARSE)
}

Without these changes, I'd have to parse the string myself, find the emoji, do the substitutions, load in my own descriptions, etc., whereas with this new API I am able to just tell emoji-java, "hey just find emoji and do this with them please`. The rest of the changes are just converting the existing code to this new API for consistency. And of course, the one typo fix, probably because I'm too OCD for my own good :P

iostat avatar Apr 29 '16 18:04 iostat