nbmerge
nbmerge copied to clipboard
not save as utf8, UnicodeDecodeError('utf-8',
if there are some character beyond ASCII, it do not save as utf-8.
for example, Chinese in *.ipynb, it is saved as GBK actully. So cause UnicodeDecodeError('utf-8',
Hello, @Yensan. Can you post a gist to an example notebook along with the command you used to reproduce it?
Thanks!
@jbn
just as what you said in readme nbmerge file_1.ipynb file_2.ipynb file_3.ipynb > merged.ipynb
, but the file I edit have some some character beyond ASCII.
It is very simple to you to reproduce: new a *.ipynb; paste some Chinese; then nbmerge file_1.ipynb file_2.ipynb file_3.ipynb > merged.ipynb
I use VScode(Editor) to reset the encode, every thing is ok.
Sorry for the delay, @Yensan!
I was unable to replicate this. Are you on windows? I think the default encoding for command line is not unicode for windows, so when you pipe output it's going to give a problem. Try doing,
nbmerge file_1.ipynb file_2.ipynb file_3.ipynb -o _merged.ipynb
instead to skip piping. If not, let me know and I'll go back to debugging.
@jbn Not sorry at all. Thank you for this tool and reply. Yes you are right, I was using company computer which is Win7. I use MacOS, I just resigned one week ago. So it will delay to replicate
Hi @Yensan.
I read up a bit on the problem and would like to fix it. Any chance I could get you to run this script:
https://gist.github.com/jbn/6b87f180cff5dae4b6554ef58ba26c6f
in the directory with your notebooks, replacing "./YOUR_NOTEBOOK_FILE.ipynb" with your notebook name. If you copy and paste the output, it should be a relatively easy fix.
Thanks if you can :)
(⊙o⊙) oh! Sorry I can't open https://gist.github.com/ in my net.... Because 'Greate wall' issue 😄 You can just paste here. I am in a new company now, so this is not the same environment. But I will use Chinese or other Non-Ascii words to test it. Recent days I get in an ctypes trouble, if you know how to slove it, please paste your answer. https://stackoverflow.com/questions/49913956/ctypes-use-pointer-and-cfunctype
import sys, locale
exprs = """
locale.getpreferredencoding()
type(fp)
fp.encoding
sys.stdout.isatty()
sys.stdout.encoding
sys.stdin.isatty()
sys.stdin.encoding
sys.stderr.isatty()
sys.stderr.encoding
sys.getdefaultencoding()
sys.getfilesystemencoding()
"""
with open("./YOUR_NOTEBOOK_FILE.ipynb", "r") as fp:
for expr in exprs.strip().split():
print(expr.rjust(30), eval(expr))
Can't help with the ctypes issue. Never really use that code.
I am so sorry to reply so late, because my career is so tortuous. (If any remote job will be grateful)
This .ipynb file is edited in Windows and Mac, then I run your script in Windows 10 pro(Chinese-simpfied), Although Win10 is a virtual machine, but never mind, the result is the same.
Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32
Windows:
C:\Users\aC>systeminfo
主机名: C53
OS 名称: Microsoft Windows 10 专业版
OS 版本: 10.0.17763 暂缺 Build 17763
OS 制造商: Microsoft Corporation
OS 配置: 独立工作站
OS 构件类型: Multiprocessor Free
初始安装日期: 2019/1/6, 14:03:29
系统启动时间: 2019/1/11, 0:28:07
系统类型: x64-based PC
处理器: 安装了 1 个处理器。
[01]: Intel64 Family 6 Model 61 Stepping 4 GenuineIntel ~1600 Mhz
BIOS 版本: Parallels Software International Inc. 14.0.1 (45154), 2018/9/7
系统区域设置: zh-cn;中文(中国)
输入法区域设置: en-us;英语(美国)
Your script output:
locale.getpreferredencoding() cp936
type(fp) <class '_io.TextIOWrapper'>
fp.encoding cp936
sys.stdout.isatty() True
sys.stdout.encoding cp936
sys.stdin.isatty() True
sys.stdin.encoding cp936
sys.stderr.isatty() True
sys.stderr.encoding cp936
sys.getdefaultencoding() utf-8
sys.getfilesystemencoding() mbcs
import sys, locale
exprs = """ locale.getpreferredencoding() type(fp) fp.encoding sys.stdout.isatty() sys.stdout.encoding sys.stdin.isatty() sys.stdin.encoding sys.stderr.isatty() sys.stderr.encoding sys.getdefaultencoding() sys.getfilesystemencoding() """
with open("./YOUR_NOTEBOOK_FILE.ipynb", "r") as fp: for expr in exprs.strip().split(): print(expr.rjust(30), eval(expr)) Can't help with the ctypes issue. Never really use that code.
Hello @jbn, I'm also having this problem while merging three notebooks with chinese characters, here's the output of your script and I've also attached my three files to be merged: Desktop.zip
!nbmerge 1.ipynb 2.ipynb 3.ipynb > merged.ipynb
Thx a lot!!
Best, PJ
locale.getpreferredencoding() cp936
type(fp) <class '_io.TextIOWrapper'>
fp.encoding cp936
sys.stdout.isatty() False
sys.stdout.encoding UTF-8
sys.stdin.isatty() False
sys.stdin.encoding cp936
sys.stderr.isatty() False
sys.stderr.encoding UTF-8
sys.getdefaultencoding() utf-8
sys.getfilesystemencoding() utf-8
To clarify, is this issue only on Windows, and not Unix (Linux or Mac OS)?
EDIT: I just ran this on Ubuntu Bionic (copy-pasted Chinese characters into two notebooks), e.g.
nbmerge unicode1.ipynb unicode2.ipynb > new.ipynb
and ran into new issues whatsoever.
So I think it could be helpful to label this issue as being specific to Windows only (to avoid unnecessarily freaking out/turning off people who aren't running this with Windows).
This is a great package by the way! Elegant solution to a recurring problem.