mxnet icon indicating copy to clipboard operation
mxnet copied to clipboard

fix UnicodeDecodeError: 'utf-8' codec can't decode bytes in position …

Open chinakook opened this issue 5 years ago • 10 comments

…73-74: invalid continuation byte

fix bug on Windows

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • [ ] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • [ ] Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • [ ] Feature1, tests, (and when applicable, API doc)
  • [ ] Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

chinakook avatar Mar 26 '20 23:03 chinakook

Hey @chinakook , Thanks for submitting the PR Once your PR is ready for CI checks, invoke the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [miscellaneous, website, centos-gpu, centos-cpu, edge, unix-gpu, clang, sanity, windows-cpu, unix-cpu, windows-gpu]


Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.

mxnet-bot avatar Mar 26 '20 23:03 mxnet-bot

@wkcn I copy it from https://github.com/apache/incubator-mxnet/blob/3f0b049a30fcacfbac50067d8d9552a68f1ff44a/python/mxnet/_ffi/base.py#L32

chinakook avatar Apr 06 '20 07:04 chinakook

@chinakook instead of handling some peculiar windows encodings in the Python frontend, would it make sense to change the C API to always use unicode instead of some platform defaults? I doubt that your patch handles all edge cases for different platforms?

leezu avatar Apr 06 '20 16:04 leezu

@hzfan I copy your code here. Can you fix that as @leezu said?

chinakook avatar Apr 06 '20 17:04 chinakook

@chinakook Could you elaborate about how to reproduce …73-74: invalid continuation byte?

hzfan avatar Apr 07 '20 09:04 hzfan

@hzfan I compiled mxnet by myself on windows, and then typed import mxnet in python. The result was that …73-74: invalid continuation byte prompted. Then I searched issues, and I found your code looked very same to that error code. I copied your code.

chinakook avatar Apr 07 '20 11:04 chinakook

Could you provide your build script? CI hvae tested the build on windows, so the error seems a bit strange to me. Here is a successful build for your reference: https://github.com/apache/incubator-mxnet/blob/master/ci/build_windows.py . cc @vexilligera

hzfan avatar Apr 08 '20 04:04 hzfan

cmake -G"Visual Studio 15 2017 Win64" -Thost=x64 -DUSE_OLDCMAKECUDA=0 -DUSE_SPLIT_ARCH_DLL=0 -DUSE_CUDA=1 -DUSE_CUDNN=1 -DUSE_NVRTC=1 -DUSE_MKL_IF_AVAILABLE=0 -DUSE_MKLDNN=0 -DUSE_F16C=0 -DBLAS="MKL" -DCMAKE_CXX_FLAGS="/FS /MD /O2 /Ob2" -DMKL_INCLUDE_DIR="C:/Program Files (x86)/IntelSWTools/compilers_and_libraries/windows/mkl/include" -DMKL_RT_LIBRARY="C:/Program Files (x86)/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64/mkl_rt.lib" -DOpenCV_DIR="E:/proj/dev/mxcproj/deps/opencv4/build/x64/vc15/lib" -DCMAKE_CUDA_COMPILER="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe" -DCUDA_TOOLKIT_ROOT_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1" -DCMAKE_PREFIX_PATH="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/lib/x64" -DCUDNN_INCLUDE="E:/proj/dev/cudnn/include" -DCUDNN_LIBRARY="E:/proj/dev/cudnn/lib/x64/cudnn.lib" -DMXNET_CUDA_ARCH="6.1 7.5" -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX="install" ..

chinakook avatar Apr 13 '20 14:04 chinakook

@chinakook Is libmxnet.so or incubator-mxnet/ placed in some path with names containing unicode like chinese characters?

@leezu I agree that it would be better if the C API always uses a fixed unicode encoding, like utf-8. But as far as I know, C string without any prefix like u8 always has a platform-dependent encoding (referred to as execution narrow encoding [1]). I am not an expert in this area, correct me if I'm wrong.

[1] https://en.cppreference.com/w/cpp/language/string_literal

hzfan avatar Apr 14 '20 04:04 hzfan

@hzfan , pure English

chinakook avatar Apr 14 '20 06:04 chinakook