linkifyjs icon indicating copy to clipboard operation
linkifyjs copied to clipboard

Making custom token for use in a custom plugin

Open nrakic90 opened this issue 9 years ago • 6 comments

Hello.

First I want to say good job on this plugin. I am making a plugin that will detect custom format, something in the lines of "keyword://test/test1/test2". I managed to make a plugin based on what I saw in hasthag.js and mention.js . I am having trouble making a token out of "keyword". Can you explain this process a bit? I've attached a "sketch" of my plugin, would you kindly tell me what am I doing wrong? I would be grateful. All the best! untitled

nrakic90 avatar Sep 29 '16 15:09 nrakic90

Hey @nrakic90, the first thing I wanted to mention is that the plugin API is largely undocumented and is subject to change in the future. Given that, kudos to you for figuring this out.

The big roadblock you'll run into next is due to a fundamental problem with the plugin API: There's no easy way to integrate new text tokens with the rest of the link-parsing state machine. I'm going to try my best to help you out here, but this is going to get complicated.

The first thing you need is to generate intermediate CharacterStates for the keyword text token. This will involve a call to the stateify function after you've defined KEYWORD_TOKEN. That should look like this:

let intermediateKeywordStates = stateify('keyword', S_START, KEYWORD_TOKEN, linkify.scanner.tokens.DOMAIN);

Then you'll need a loop like this for the intermediate states, since those could have jumps to domains (e.g., key is an intermediate state that could be a domain, and keys is a domain but even though it started with the key, it will never resolve to keyword). ALPHANUM should be defined like this.

See how the localhost text token is handled for a real example of this.

In your example, seeing the text token keyword jumps you into the S_KEYWORD state from the S_START state. But what happens if instead of // you see .com? Then you'd expect keyword.com to be of type url. But text tokens currently are not polymorphic, so you'd have to manually define jumps to and from S_KEYWORD. Basically, you'll need to duplicate all lines in parser.js that contain S_DOMAIN and replace S_DOMAIN with S_KEYWORD.

TL;DR, this is doable but not pretty. There are definitely plans on improving this interface to abstract-away all this complexity, but for now that's all the help I can offer.

nfrasser avatar Sep 30 '16 18:09 nfrasser

Thank you so much for an in-depth explanation, I really appreciate it! I was experimenting with statefy at one point but then gave it up because I didn't have all the pieces of the puzzle apparently. Thanks again!

nrakic90 avatar Oct 03 '16 08:10 nrakic90

Has this gotten any easier. I really would like to use a custom token!

toger5 avatar Jan 19 '22 11:01 toger5

In the docs it seems like, it should be possible to do S_START.tt("a", acceptedState) to transition on an 'a'. From the documentation https://github.com/Hypercontext/linkifyjs/blob/a38611393a35b922b34632a30a79fb709c745b2e/packages/linkifyjs/src/core/fsm.js#L52

This does not seem to work. How is the word character meant in the docs.

toger5 avatar Jan 19 '22 16:01 toger5

@toger5 I'm working on some additional examples/docs for this in an upcoming release. For now, check out the hashtag plugin for reference

Notes:

  • Linkify has two state machines for tokenizing strings, the scanner and parser
  • The scanner groups string characters into smaller, self-container tokens such as NUM (a number) or TLD (any top-level domain name like "com")
    • The starting state (S_START) is scanner.start
  • The parser (used in the hashtag plugin example) groups text tokens from the scanner into "multi-tokens" such as URL, EmailAddress or Hashtag
    • The starting state is parser.start
  • Similarly to how adding the hashtag multi-token works in the example plugin, you can add a new scanner token. For example:
    const GreetingState = scanner.start
      .tt('h')
      .tt('e')
      .tt('l')
      .tt('l')
      .tt('o', 'GREETING') // create accepting state
    
    The scanner will recognize the word "hello" as a GREETING token. You can capture the states and branch off to recognize additional GREETINGs:
    const HState = scanner.start.tt('h')
    const GreetingState = HState
      .tt('i', GreetingState)  // don't create a new accepting state, use the existing one
    
    Now both "hi" and "hello" are recognized as GREETING tokens. You can similarly use the GREETING token with the scanner:
    const GreetingMultiToken = utils.createTokenClass('greeting', { 
      isLink: true,
      toHref() {
        return `javascript:alert("${this.toString()}!")`
     })
    parser.start.tt('GREETING', GreetingMultiToken)
    
  • There is no way to create tokens from arbitrary regular expressions right now with the tt method
    • You can, however, emulate anything that's possible with a regular expression by capturing the states and transitioning between them multiple times (the second argument to tt is either an accepting token or any previously-captured state).
    • This may improve in a future release.

nfrasser avatar Jan 21 '22 16:01 nfrasser

This is super helpful thank you very much for the detailed comment! I was trying something like this:

const acceptingState = createTokenClass("something")
scanner.start
  .tt('h')
  .tt('e')
  .tt('l')
  .tt('l')
  .tt('o', acceptingState)

but that did not seem to work. For me PARAM1 and PARAM2 in const PARAM1 = state.tt('TOKEN') and state.tt('TOKEN', PARAM2) were basically the same except, that in the second case PARAM2 needs to be created before. In your example they seem to differ, so that PARAM2 can also be used to add a new token called GREETING. But this seems to indicate, that there is another difference between PARAM1 and PARAM2

// (A)
const GreetingState = HState
  .tt('i', GreetingState)  // don't create a new accepting state, use the existing one
// VS
// (B)
const GreetingState = HState
  .tt('i')

What I tried is (A) but that does not seem to work. (B) however does. What exactly is the difference between those two?

toger5 avatar Jan 21 '22 21:01 toger5