serverless-chrome icon indicating copy to clipboard operation
serverless-chrome copied to clipboard

Include Chinese/Japanese/Korean/more fonts in headless Chrome binary

Open adieuadieu opened this issue 7 years ago β€’ 35 comments

From downstream issue in https://github.com/graphcool/chromeless/issues/43

Should include the relevant fonts for all scripts, not just kanji.

Resources:

  • Use fontconfig? https://groups.google.com/a/chromium.org/forum/#!searchin/headless-dev/font/headless-dev/Pmnxb1lyDBg/DyNZYxMeBgAJ
  • more hints in here? Browserless dockerfile

adieuadieu avatar Jul 28 '17 08:07 adieuadieu

@adieuadieu we solved this for a subset of languages (CJK) in our lambda function that uses phantomjs today. We were able to do it by:

  • packaging a .fonts directory with the TTF for NotoSansCJK-Regular.ttc and including that in our function upload zip.
  • adding an environment variable in our lambda function console (on AWS) called HOME set to /var/task. This allows Qt to pick up the included font.

The liability here is that due to the 50MB size limit on Lambda function packages you have to choose which fonts to be included carefully. In our case the Noto Sans font solved our issues, but I'm sure other fonts will be needed for other purposes.

I'm relaying most of this second-hand because one of my colleagues did most of the work earlier today. I'm going to try digging in further on it to make sure I got that right but that might be a possible solution for this project as well. I can try taking a crack at it next week if that would help, but I better get acclimated with this project more before doing any work. Nice work on this BTW!

toddwprice avatar Jul 29 '17 00:07 toddwprice

@toddwprice oh that's great news! I hope Chrome looks for fonts in the same place.

About the 50MB limit: if you deploy your Lambda function with the deployment package in S3, the package’s size limit increases dramaticallyβ€Šβ€”β€Štechnically 250MB (realistically more around 100MB when packaging less compressable data like executable binaries.) Forgive me for linking to myself: I recently wrote more about it in this article.

adieuadieu avatar Jul 29 '17 20:07 adieuadieu

@adieuadieu wow we were early adopters of Lambda but never questioned the 5B limit. Great article! I will see if I can include Noto Sans for starters and if that works then we could add other fonts to plug other common holes.

toddwprice avatar Jul 29 '17 23:07 toddwprice

*50MB

toddwprice avatar Jul 29 '17 23:07 toddwprice

@adieuadieu I'm trying to get going on the project but getting errors with some missing dependencies and files when running npm test. Let me know if you want me to post my errors here or ping you somewhere else. I'm using the develop branch by the way. Thanks.

toddwprice avatar Aug 04 '17 13:08 toddwprice

@toddwprice Jump over npm test. Which folder are you working in? packages/lambda may be the best place to play around in. There's a pesudo-integration test for Serverless there which you can use to deploy to Lambda. Run npm run build in the packages/lambda, then create a symlink in package/lambda/integration-test for a dist folder which points to the parent directory's dist folder (packages/lambda/dist)

My local setup (it's not so pretty..):

marco:integration-test marco$ pwd
/Users/marco/src/github/serverless-chrome/packages/lambda/integration-test
marco:integration-test marco$ ls -lhtra
total 260800
-rwxr-xr-x   1 marco  502   127M May  9 07:14 headless_shell
lrwxr-xr-x   1 marco  502     8B Jun 18 23:21 dist -> ../dist/
-rw-r--r--   1 marco  502   463B Jul 10 18:48 serverless.yml
-rw-r--r--   1 marco  502   789B Jul 10 18:48 handler.js
drwxr-xr-x  13 marco  502   442B Jul 10 18:48 ..
drwxr-xr-x   7 marco  502   238B Jul 10 18:48 .

adieuadieu avatar Aug 04 '17 14:08 adieuadieu

@toddwprice Sureβ€”would at 18:30 CEST work? Could you DM me on Twitter or Gitter (@adieuadieu), or email (on my GitHub profile) so we can settle on a tool/service/share usernames to screen share?

adieuadieu avatar Aug 04 '17 14:08 adieuadieu

Perfect. Sent you a message in gitter.

On Fri, Aug 4, 2017 at 9:53 AM, Marco LΓΌthy [email protected] wrote:

@toddwprice https://github.com/toddwprice Sureβ€”would at 18:30 CEST work? Could you DM me on Twitter or Gitter (@adieuadieu https://github.com/adieuadieu), or email (on my GitHub profile) so we can settle on a tool/service to screen share?

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adieuadieu/serverless-chrome/issues/49#issuecomment-320269578, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7M1uw0usT3G2lMbmUmoLv-Poo27U3gks5sUzBOgaJpZM4OmTeV .

--


Todd Price [email protected]

toddwprice avatar Aug 04 '17 15:08 toddwprice

@adieuadieu I'm struggling to get a good test running in Lambda without adding too many other dependencies. My current approach is to use chrome-remote-interface directly inside the test handler. See this file: handler.js.zip.

Two problems so far:

  1. Chrome spins up fine the first time, but fails afterwards. The logic around recognizing a running instance when a container is re-used is either not working or I've configured it wrong.

  2. Screenshots are returning a blank page. I saw this behavior in the past when testing chrome --headless with chrome-remote-interface so it's likely something I'm doing wrong there.

Any pointers you could give me to get me on track with a valid test would be appreciated.

toddwprice avatar Aug 10 '17 14:08 toddwprice

Hi @toddwprice thanks for the update. I would not worry too much about the first problem or adding too many dependencies. I would focus on just getting fonts working correctly with blatant disregard for anything else. Once fonts work, we'd have a proof-of-concept that it's possible. We can iterate from there to make it cleaner/easier.

With that in mind.. The example handler in this repository should work for capturing a screenshot, at least on Lambda. You might need to wait for the page to load before taking the screenshot. For a simple, mostly static page without any ajax-y behavior which occurs after the DOMContentLoaded event fires, you can wait on CDPs Page.loadEventFired() Promise to resolve before doing Page.captureScreenshot().

adieuadieu avatar Aug 10 '17 14:08 adieuadieu

Well this is probably no help since I don't even use AWS, but I'll share anyway. I run chromeless in a docker-container side by side with knqz/chrome-headless and to that I add:

ADD https://noto-website.storage.googleapis.com/pkgs/NotoSansCJKjp-hinted.zip /tmp
RUN unzip /tmp/NotoSansCJKjp-hinted.zip && \
    mkdir -p /usr/share/fonts/noto && \
    cp *.otf /usr/share/fonts/noto && \
    chmod 644 -R /usr/share/fonts/noto/ && \
    fc-cache -fv

(All of noto is 120MB and only regular is 15MB) after that at least japanese works fine. You probably have all that figured already, so sorry for being noisy!

http://qiita.com/dd511805/items/dfe03c5486bf1421875a

kumorig avatar Aug 20 '17 11:08 kumorig

Thank you for the tip, @kumorig!

adieuadieu avatar Aug 23 '17 14:08 adieuadieu

Please use it as a reference http://fd0.hatenablog.jp/entry/2017/09/10/223042 (sorry, written in Japanese)

  • use custom fontconfig
  • use small size font
  • strip chrome binary

fd00 avatar Sep 11 '17 15:09 fd00

thanks @fd00, I tried exactly as your guide in the blog but it seems not working for me, I still get a lot tofu after deployed to lambda. Not sure if I missed anything :(. Edited: It works for font IPAexfont, but not working for Noto fonts. I managed to upload Noto fonts in the packaged, but seems like not working properly for fontconfig. (upload via S3 allowed you to deploy up to 250mb)

nmqanh avatar Sep 24 '17 00:09 nmqanh

Updated: OTF fonts from google do not work for me, but TTC from google does work well with fontconfig following the guide of @fd00 thanks a lot, mate :). TTC fonts can be downloaded via https://github.com/googlei18n/noto-cjk

nmqanh avatar Sep 25 '17 02:09 nmqanh

another update: since [email protected] with headless_shell changing to headless-chromium, the method of @fd00 stopped working and the tofus are now coming back :(

nmqanh avatar Dec 04 '17 06:12 nmqanh

@nmqanh you might just need to change the name/paths in a few steps from headless_shell to headless-chrome. For example, in the article, in the Deploy section, there is reference to CHROME_PATH pointing at headless_shell. Change this to headless-chrome.

I have an implementation of font support in progress that I'll finish sometime over the next week or two which will close this Issue.

adieuadieu avatar Dec 04 '17 09:12 adieuadieu

Just a quick update: I tried updating the CHROME_PATH and also re-built the font cache from step 0 as guided in the article and it does not work, tofus are still coming back with [email protected] and later. Thanks for the good news that new release gonna support CJK fonts by default in 1-2 weeks :). Would love to try it soon. Please let me know if there is anything I can help.

nmqanh avatar Dec 06 '17 04:12 nmqanh

@adieuadieu We have tried several times to add support for this, but still a dead end. Have you manage to figure it out? Can we assist with something? Thank you for your great work.

eggnita avatar Dec 13 '17 09:12 eggnita

I tried many ways to bring CJK fonts back to headless but I could not :(. Was anyone here able to do that? please help me, I appreciate a lot, thanks. This only started breaking since [email protected], it works fine with [email protected] and lower versions.

Thanks all.

nmqanh avatar Jan 09 '18 23:01 nmqanh

I got it to work for my own setup. I documented the process I used with a little more detail than the other blog post here: https://gist.github.com/nat-n/c3429d29f2478ccb3de243810bb12956

nat-n avatar Jan 12 '18 13:01 nat-n

Thanks @nat-n , it works like a charm. The main reason were that from version 1.0.0-6 the symlink failed to run, it used to work with 1.0.0-5 and lower versions .

nmqanh avatar Jan 22 '18 01:01 nmqanh

@nat-n I've been able to include the ipaexg font into the chromeless using this method.

Docker container created, once done rsync'd it into the chromeless path so it looks like the following...

chromeless/serverless/node_modules/@serverless-chrome/lambda/dist/fontconfig/etc/fonts

and within my serverless.yml

  name: aws
  runtime: nodejs6.10
  stage: ${self:custom.stage}
  region: eu-west-1
  environment:
    DEBUG: ${self:custom.debug}
    AWS_IOT_HOST: ${self:custom.awsIotHost}
    FONTCONFIG_PATH: /var/task/node_modules/@serverless-chrome/lambda/dist/fontconfig/etc/fonts
    LD_LIBRARY_PATH: /var/task/node_modules/@serverless-chrome/lambda/dist/fontconfig/usr/lib

LeeGardiner avatar Jan 31 '18 17:01 LeeGardiner

@nat-n I tried to follow your note, but I'm missing knowledge from "Configuring fontconfig" to the end. Could you please detail more on how to do it or give links to learn what I'm missing. Thanks

luminous8 avatar Feb 04 '18 22:02 luminous8

@luminous8 I can try to help, but I'm not sure what you're missing. The general idea is that the fontconfig built inside the container also exists under /tmp outside the container, and so you can make some requires changes to it there, before running some commands from inside the container to complete the setup. I've just fixed a formatting issue that might have made a part of if less clear but I'm afraid I can't make the instructions too concrete without making them to specific to a particular setup (which may be different from your own).

nat-n avatar Feb 05 '18 16:02 nat-n

For anyone stumbling at this at some point in the future, I just wanted to mention that what @toddwprice did:

  • Upload fonts in a .fonts directory.
  • Set $HOME env var to /var/task.

Worked just fine without the need to build fontconfig or the other extra steps.

arikfr avatar Mar 29 '18 19:03 arikfr

Agree with @arikfr. Shipped the following with our Ξ».

$ tree -la
.
β”œβ”€β”€ chromium
└── .fonts
    β”œβ”€β”€ NotoColorEmoji.ttf
    β”œβ”€β”€ NotoEmoji-Regular.ttf
    β”œβ”€β”€ NotoSansArabic-Bold.ttf
    β”œβ”€β”€ NotoSansArabic-Regular.ttf
    β”œβ”€β”€ NotoSansCJKjp-Bold.otf
    β”œβ”€β”€ NotoSansCJKjp-Regular.otf
    β”œβ”€β”€ NotoSansCJKkr-Bold.otf
    β”œβ”€β”€ NotoSansCJKkr-Regular.otf
    β”œβ”€β”€ NotoSansCJKsc-Bold.otf
    β”œβ”€β”€ NotoSansCJKsc-Regular.otf
    β”œβ”€β”€ NotoSansCJKtc-Bold.otf
    β”œβ”€β”€ NotoSansCJKtc-Regular.otf
    β”œβ”€β”€ NotoSansHebrew-Bold.ttf
    β”œβ”€β”€ NotoSansHebrew-Regular.ttf
    β”œβ”€β”€ NotoSansMongolian-Regular.ttf
    β”œβ”€β”€ NotoSansThai-Bold.ttf
    └── NotoSansThai-Regular.ttf

Unpacked as is to /var/task. After setting $HOME to /var/task we were able to confirm CJK characters rendered.

abargnesi avatar Apr 03 '18 19:04 abargnesi

I followed @toddwprice's suggestion and it works locally but not in lambda. Then I tried *.otf file instead of *.tcc file as @abargnesi suggests, it works both locally and in lambda. The font I used is NotoSansCJKtc-Black.otf

So you may try both font files and see if any one of them works.

liwaiwai avatar May 17 '18 07:05 liwaiwai

In case this saves someone else some work.

I got fonts working by putting them in .fonts and setting HOME=/var/task, however it didn't work for me until I made the font files have permission 644 (-rw-r--r--).

NickBlow avatar Oct 11 '18 20:10 NickBlow

@NickBlow, @abargnesi

I'm still unable to get these characters to render. Can somebody clarify this for me please - where does this .fonts directory go?

I added the same fonts as listed in @abargnesi 's comment, changed their perms to 644, and tried:

  1. adding this directory to `chrome/chrome-headless-lambda-linux-x64.tar.gz' archive
  2. adding this directory at top level of serverless-chrome git repo root
  3. adding it to .serverless/serverless-chrome.zip archive

and deploying, but my characters are rendered as empty space (not even squares).

Any ideas?

Thank you .

kirilledelman avatar Dec 03 '18 17:12 kirilledelman