emt
emt copied to clipboard
Emacs macOS Tokenizer, tokenizing CJK words with macOS's built-in NLP tokenizer.
- emt.el
** Introduction
EMT stands for =Emacs MacOS Tokenizer=.
This package use macOS's built-in NLP tokenizer to tokenize and operate on CJK words in Emacs.
** Installation
*** Requirements
- macOS 10.15 or later
- Emacs 26.1 or later, built with dynamic module support (use =--with-modules= during compilation)
*** Build dynamic module
**** Pre-built (recommendation)
If you enable =emt-mode= and the module cannot be found, it will prompt whether to automatically download it from GitHub. Or you can manually retrieve the pre-built module from the [[https://github.com/roife/emacs-macos-tokenizer/releases][releases]] section and place the =dylib= file in the =emacs-macos-tokenizer-lib-path= (by default, it is located at =modules/libEMT.dylib= within your personal configuration folder, normally =~/.emacs.d/modules/libEMT.dylib=).
Current version of the dynamic module is v2.0.0, make sure you have updated to latest module.
**** Manually build
- Install Xcode.
- Build the module using =emt-compile-module=, which compiles and copies the module to =emt-lib-path=.
If you enconter the folloing error:
#+begin_quote No such module "PackageDescription" #+end_quote
run the following command and try again:
#+begin_src bash sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer #+end_src
*** Install package
Install with =straight= and =use-package=:
#+begin_src emacs-lisp (use-package emt :straight (:host github :repo "roife/emt" :files (".el" "module/" "module")) :hook (after-init . emt-mode)) #+end_src
** Customization
*** =emt-use-cache=
Caches for results of tokenization if non-nil. Default is =t=.
*** =emt-cache-lru-size=
The size of LRU cache. Default is =50=.
*** =emt-lib-path=
The path to the directory of dynamic library for emt. Default is =~/.emacs.d/modules/libEMT.dylib=.
** Usage
*** keymap: =emt-mode-map=
It remaps =forward-word=, =backward-word=, =kill-word= and =backward-kill-word= to use emt's version.
*** Minor mode
It calls =emt-ensure=, which load dynamic modeuls and set =emt-mode-map=.
*** Functions
**** =emt-word-at-point-or-forward=
Return the word at point. If current point is at bound of a word, return the one forward.
**** =emt-word-at-point-or-backward=
Return the word at point. If current point is at bound of a word, return the one backward.
**** =emt-compile-module=
Compile and copy the module to =emt-lib-path=.
It takes an optional argument =path=, which is the path to the directory of dynamic library. By default, =path= is set to =emt-lib-path=.
**** =emt-download-module=
Download dynamic module from =https://github.com/roife/emt/releases/download/<VERSION>/libEMT.dylib=.
If PATH is non-nil, download the module to PATH.
**** =emt-ensure=
Load dynamic module.
**** =emt-split=
Split string into a list of words.
Return a list of word bounds (a cons of the beginning position and the ending position of a word)
**** =emt-forward-word=
CJK compatible version of =forward-word=.
**** =emt-backward-word=
CJK compatible version of =backward-word=.
**** =emt-kill-word=
CJK compatible version of =kill-word=.
**** =emt-backward-kill-word=
CJK compatible version of =backward-kill-word=.
**** =emt-mark-word=
CJK compatible version of =mark-word=.
** Acknowledgements
This package is inspired by [[https://github.com/cireu/jieba.el/][jieba.el]] which is a Chinese tokenizer for Emacs using =jieba=.
The dynamic module uses [[https://github.com/SavchenkoValeriy/emacs-swift-module.git][emacs-swift-module]], which provides an interface for writing Emacs dynamic modules in Swift.