notedown icon indicating copy to clipboard operation
notedown copied to clipboard

UnicodeEncoderError when converting notebook containing Chinese characters

Open xmfbit opened this issue 6 years ago • 4 comments

When I tried to convert a jupyter notebook which contained Chinese characters, UnicodeEncoderError occurs like this: I have tried to install notedown using the two methods provided in README: pre-compiled package and lastest from github. And Python 2.7/3.6 are tested. Tested on Mac OS and Ubuntu16.04.

/Users/me/anaconda2/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
  "`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)
Traceback (most recent call last):
  File "/Users/me/anaconda2/bin/notedown", line 11, in <module>
    load_entry_point('notedown==1.5.1', 'console_scripts', 'notedown')()
  File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/main.py", line 312, in app
    main(args, help=parser.format_help())
  File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/main.py", line 301, in main
    writer.write(notebook, unicode_std_stream('stdout'))
  File "/Users/me/anaconda2/lib/python2.7/site-packages/nbformat/v4/rwbase.py", line 126, in write
    nbs = cast_unicode_py2(self.writes(nb, **kwargs))
  File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/notedown.py", line 434, in writes
    body, resources = self.exporter.from_notebook_node(notebook)
  File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/templateexporter.py", line 203, in from_notebook_node
    output = self.template.render(nb=nb_copy, resources=resources)
  File "/Users/me/anaconda2/lib/python2.7/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/Users/me/anaconda2/lib/python2.7/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/templates/markdown.tpl", line 1, in top-level template code
    {% extends 'display_priority.tpl' %}
  File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/display_priority.tpl", line 1, in top-level template code
    {%- extends 'null.tpl' -%}
  File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 26, in top-level template code
    {%- block body -%}
  File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 28, in block "body"
    {%- block any_cell scoped -%}
  File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 30, in block "any_cell"
    {%- block codecell scoped -%}
  File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 31, in block "codecell"
    {%- block input_group -%}
  File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 33, in block "input_group"
    {%- block input -%}{%- endblock input -%}
  File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/templates/markdown.tpl", line 4, in block "input"
    {{ cell | create_input_codeblock }}
  File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/notedown.py", line 477, in create_input_codeblock
    return codeblock.format(attributes=attrs, fence='```', cell=cell)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 85-86: ordinal not in range(128)

xmfbit avatar Apr 12 '18 03:04 xmfbit

Just a comment on this. I believe notedown uses pandoc for conversion. I have used pandoc a bit for markdown to pdf conversion and there are some extra things you need to set to to do it, for one I needed to use the LuaTeX engine to get this to work, I'm not sure if notedown supports this. For what's worth this is my configuration for using pandoc with Chinese characters. Perhaps it will give some hints for addressing this issue:


title: Test

author:

documentclass:

  • ctexart

abstract: This is a pandoc test . . .


你好吗?

To process this with Pandoc on the command line simply type:

pandoc -s chinese-input.md --latex-engine=lualatex -o output-file.pdf

babycamel avatar Apr 12 '18 05:04 babycamel

@babycamel Thanks for your comment! However, I believe that LuaTex engine is for md-> pdf conversion, not for notebook -> md conversion. Anyway, thanks!

xmfbit avatar Apr 12 '18 06:04 xmfbit

Correct but notedown uses pandoc and knitr for notebook -> md, it sets the encoding to utf-8 which I think you need. knitr handles Chinese without difficulty, so I suspect the problem is with pandoc attributes. I could be wrong though.

babycamel avatar Apr 12 '18 06:04 babycamel

Thanks! I currently don't have motivation to solve this problem. Maybe I will try the solution afterwards. 🤕

xmfbit avatar Apr 12 '18 06:04 xmfbit