emt icon indicating copy to clipboard operation
emt copied to clipboard

Emacs macOS Tokenizer, tokenizing CJK words with macOS's built-in NLP tokenizer.

  • emt.el

** Introduction

EMT stands for =Emacs MacOS Tokenizer=.

This package use macOS's built-in NLP tokenizer to tokenize and operate on CJK words in Emacs.

** Installation

*** Requirements

  • macOS 10.15 or later
  • Emacs 26.1 or later, built with dynamic module support (use =--with-modules= during compilation)

*** Build dynamic module

**** Pre-built (recommendation)

If you enable =emt-mode= and the module cannot be found, it will prompt whether to automatically download it from GitHub. Or you can manually retrieve the pre-built module from the [[https://github.com/roife/emacs-macos-tokenizer/releases][releases]] section and place the =dylib= file in the =emacs-macos-tokenizer-lib-path= (by default, it is located at =modules/libEMT.dylib= within your personal configuration folder, normally =~/.emacs.d/modules/libEMT.dylib=).

Current version of the dynamic module is v2.0.0, make sure you have updated to latest module.

**** Manually build

  • Install Xcode.
  • Build the module using =emt-compile-module=, which compiles and copies the module to =emt-lib-path=.

If you enconter the folloing error:

#+begin_quote No such module "PackageDescription" #+end_quote

run the following command and try again:

#+begin_src bash sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer #+end_src

*** Install package

Install with =straight= and =use-package=:

#+begin_src emacs-lisp (use-package emt :straight (:host github :repo "roife/emt" :files (".el" "module/" "module")) :hook (after-init . emt-mode)) #+end_src

** Customization

*** =emt-use-cache=

Caches for results of tokenization if non-nil. Default is =t=.

*** =emt-cache-lru-size=

The size of LRU cache. Default is =50=.

*** =emt-lib-path=

The path to the directory of dynamic library for emt. Default is =~/.emacs.d/modules/libEMT.dylib=.

** Usage

*** keymap: =emt-mode-map=

It remaps =forward-word=, =backward-word=, =kill-word= and =backward-kill-word= to use emt's version.

*** Minor mode

It calls =emt-ensure=, which load dynamic modeuls and set =emt-mode-map=.

*** Functions

**** =emt-word-at-point-or-forward=

Return the word at point. If current point is at bound of a word, return the one forward.

**** =emt-word-at-point-or-backward=

Return the word at point. If current point is at bound of a word, return the one backward.

**** =emt-compile-module=

Compile and copy the module to =emt-lib-path=.

It takes an optional argument =path=, which is the path to the directory of dynamic library. By default, =path= is set to =emt-lib-path=.

**** =emt-download-module=

Download dynamic module from =https://github.com/roife/emt/releases/download/<VERSION>/libEMT.dylib=.

If PATH is non-nil, download the module to PATH.

**** =emt-ensure=

Load dynamic module.

**** =emt-split=

Split string into a list of words.

Return a list of word bounds (a cons of the beginning position and the ending position of a word)

**** =emt-forward-word=

CJK compatible version of =forward-word=.

**** =emt-backward-word=

CJK compatible version of =backward-word=.

**** =emt-kill-word=

CJK compatible version of =kill-word=.

**** =emt-backward-kill-word=

CJK compatible version of =backward-kill-word=.

**** =emt-mark-word=

CJK compatible version of =mark-word=.

** Acknowledgements

This package is inspired by [[https://github.com/cireu/jieba.el/][jieba.el]] which is a Chinese tokenizer for Emacs using =jieba=.

The dynamic module uses [[https://github.com/SavchenkoValeriy/emacs-swift-module.git][emacs-swift-module]], which provides an interface for writing Emacs dynamic modules in Swift.