notedown
notedown copied to clipboard
UnicodeEncoderError when converting notebook containing Chinese characters
When I tried to convert a jupyter notebook which contained Chinese characters, UnicodeEncoderError occurs like this:
I have tried to install notedown
using the two methods provided in README
: pre-compiled package and lastest from github. And Python 2.7/3.6 are tested. Tested on Mac OS and Ubuntu16.04.
/Users/me/anaconda2/lib/python2.7/site-packages/IPython/html.py:14: ShimWarning: The `IPython.html` package has been deprecated. You should import from `notebook` instead. `IPython.html.widgets` has moved to `ipywidgets`.
"`IPython.html.widgets` has moved to `ipywidgets`.", ShimWarning)
Traceback (most recent call last):
File "/Users/me/anaconda2/bin/notedown", line 11, in <module>
load_entry_point('notedown==1.5.1', 'console_scripts', 'notedown')()
File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/main.py", line 312, in app
main(args, help=parser.format_help())
File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/main.py", line 301, in main
writer.write(notebook, unicode_std_stream('stdout'))
File "/Users/me/anaconda2/lib/python2.7/site-packages/nbformat/v4/rwbase.py", line 126, in write
nbs = cast_unicode_py2(self.writes(nb, **kwargs))
File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/notedown.py", line 434, in writes
body, resources = self.exporter.from_notebook_node(notebook)
File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/templateexporter.py", line 203, in from_notebook_node
output = self.template.render(nb=nb_copy, resources=resources)
File "/Users/me/anaconda2/lib/python2.7/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/Users/me/anaconda2/lib/python2.7/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/templates/markdown.tpl", line 1, in top-level template code
{% extends 'display_priority.tpl' %}
File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/display_priority.tpl", line 1, in top-level template code
{%- extends 'null.tpl' -%}
File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 26, in top-level template code
{%- block body -%}
File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 28, in block "body"
{%- block any_cell scoped -%}
File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 30, in block "any_cell"
{%- block codecell scoped -%}
File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 31, in block "codecell"
{%- block input_group -%}
File "/Users/me/anaconda2/lib/python2.7/site-packages/nbconvert/exporters/../templates/skeleton/null.tpl", line 33, in block "input_group"
{%- block input -%}{%- endblock input -%}
File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/templates/markdown.tpl", line 4, in block "input"
{{ cell | create_input_codeblock }}
File "/Users/me/anaconda2/lib/python2.7/site-packages/notedown/notedown.py", line 477, in create_input_codeblock
return codeblock.format(attributes=attrs, fence='```', cell=cell)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 85-86: ordinal not in range(128)
Just a comment on this. I believe notedown uses pandoc for conversion. I have used pandoc a bit for markdown to pdf conversion and there are some extra things you need to set to to do it, for one I needed to use the LuaTeX engine to get this to work, I'm not sure if notedown supports this. For what's worth this is my configuration for using pandoc with Chinese characters. Perhaps it will give some hints for addressing this issue:
title: Test
author:
documentclass:
- ctexart
abstract: This is a pandoc test . . .
你好吗?
To process this with Pandoc on the command line simply type:
pandoc -s chinese-input.md --latex-engine=lualatex -o output-file.pdf
@babycamel Thanks for your comment! However, I believe that LuaTex engine is for md-> pdf conversion, not for notebook -> md conversion. Anyway, thanks!
Correct but notedown uses pandoc and knitr for notebook -> md, it sets the encoding to utf-8 which I think you need. knitr handles Chinese without difficulty, so I suspect the problem is with pandoc attributes. I could be wrong though.
Thanks! I currently don't have motivation to solve this problem. Maybe I will try the solution afterwards. 🤕