slate icon indicating copy to clipboard operation
slate copied to clipboard

Non-Latin support

Open noma4i opened this issue 7 years ago • 10 comments

Slate can support non-latin language toc-linking and lunr search.

Redcarpet fix: https://github.com/vmg/redcarpet/pull/539 Lunr multilang: https://github.com/MihaiValentin/lunr-languages#indexing-multi-language-content

noma4i avatar Aug 27 '16 14:08 noma4i

It appears Lunr would need a new locale file for each language we want to support. Is that correct? Just checked their documentation but could use some guidance to get started

NaanProphet avatar Aug 30 '16 11:08 NaanProphet

@NaanProphet You will need to include locale files from Lunr multilang like lunr.ru.js.

Example setup will be: Add to Slate project these files from Lunr Multilang Project

  • lunr.stemmer.support.js
  • lunr.multi.js
  • lunr.ru.js *this will add Russian language support, add locales you need.

Change Slate file: source/javascripts/app/_search.js

  var index = new lunr.Index();

  index.ref('id');
  index.field('title', { boost: 10 });
  index.field('body');
  index.pipeline.add(lunr.trimmer, lunr.stopWordFilter);

to

  var index = lunr(function () {
    this.use(lunr.multiLanguage('ru', 'en'));
    this.ref('id');
    this.field('title', { boost: 10 });
    this.field('body');
  })

Where multiLanguage('ru', 'en') should contain list of locales you need.

To make complete solution for Non-Latin you will need patched version of redcarpet gem. Iv created such for myself. Feel free to use it in your Gemfile: gem 'redcarpet', '~> 3.3', github: 'noma4i/redcarpet'

My Gemfile:

source 'https://rubygems.org'

# Middleman
gem 'middleman', '~>4.0.0'
gem 'middleman-gh-pages', '~> 0.0.3'
gem 'middleman-syntax', '~> 2.1.0'
gem 'middleman-autoprefixer', '~> 2.7.0'
gem "middleman-sprockets", "~> 4.0.0.rc"
gem 'rouge', '~> 1.10.1'
gem 'redcarpet', '~> 3.3', github: "noma4i/redcarpet"
gem 'middleman-deploy', github: 'middleman-contrib/middleman-deploy', branch: 'master'
gem 'haml'

In the end you will have fully working Slate with Non-Latin support for TOC links and Lunr search working with english and your Non-Latin language ;)

noma4i avatar Aug 30 '16 12:08 noma4i

This is really helpful, thanks!

In my case, the language i'm trying to index (Hindi) doesn't yet have a stemmer, so i guess i'll have to start by looking into that first...

NaanProphet avatar Aug 31 '16 02:08 NaanProphet

@lord @noma4i @NaanProphet can we figure out a way to make this change landed on master? I'd love to get non latin support work out of box.

hyugit avatar Oct 20 '16 22:10 hyugit

Hi @trave7er, as far as i can tell, each non-Latin language needs to have a well-written stemmer. Some languages already have these, and some (link Hindi and Telugu) do not.

Which language were you looking at in particular?

NaanProphet avatar Oct 24 '16 00:10 NaanProphet

Hi @NaanProphet, I'm looking for Japanese and Chinese stemmers. Thank you.

hyugit avatar Oct 24 '16 00:10 hyugit

I think we can bake rake task to accept lang as param and generate assets. Stick with my tutorial for now.

@trave7er not sure that Chinese stemmer is available atm.

noma4i avatar Oct 24 '16 00:10 noma4i

ok, gotcha. thanks guys.

hyugit avatar Oct 24 '16 00:10 hyugit

Isn't it resolved? I have problems with korean too

stzminjae avatar Sep 20 '17 03:09 stzminjae

I cannot seem to use @noma4i 's implementation. Trying to get lunr to support Japanese.

The lunr changes are simple enough, but I'm not sure what changes to redcarpet are required. I cannot simply edit my gemfile to point to the patched redcarpet code - it might have to do with the path using redcarpet 3.3 when Slate is now using 3.4. Any thoughts or other workarounds to enable this?

stownsend2121 avatar Oct 03 '18 19:10 stownsend2121