mxnet
mxnet copied to clipboard
fix UnicodeDecodeError: 'utf-8' codec can't decode bytes in position …
…73-74: invalid continuation byte
fix bug on Windows
Description
(Brief description on what this PR is about)
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
- [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
- [ ] Changes are complete (i.e. I finished coding on this PR)
- [ ] All changes have test coverage:
- Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
- Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
- Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
- [ ] Code is well-documented:
- For user-facing API changes, API doc string has been updated.
- For new C++ functions in header files, their functionalities and arguments are documented.
- For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
- Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
- [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Changes
- [ ] Feature1, tests, (and when applicable, API doc)
- [ ] Feature2, tests, (and when applicable, API doc)
Comments
- If this change is a backward incompatible change, why must this change be made.
- Interesting edge cases to note here
Hey @chinakook , Thanks for submitting the PR Once your PR is ready for CI checks, invoke the following commands:
- To trigger all jobs: @mxnet-bot run ci [all]
- To trigger specific jobs: @mxnet-bot run ci [job1, job2]
CI supported jobs: [miscellaneous, website, centos-gpu, centos-cpu, edge, unix-gpu, clang, sanity, windows-cpu, unix-cpu, windows-gpu]
Note: Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. All CI tests must pass before the PR can be merged.
@wkcn I copy it from https://github.com/apache/incubator-mxnet/blob/3f0b049a30fcacfbac50067d8d9552a68f1ff44a/python/mxnet/_ffi/base.py#L32
@chinakook instead of handling some peculiar windows encodings in the Python frontend, would it make sense to change the C API to always use unicode instead of some platform defaults? I doubt that your patch handles all edge cases for different platforms?
@hzfan I copy your code here. Can you fix that as @leezu said?
@chinakook Could you elaborate about how to reproduce …73-74: invalid continuation byte?
@hzfan I compiled mxnet by myself on windows, and then typed import mxnet in python. The result was that …73-74: invalid continuation byte prompted.
Then I searched issues, and I found your code looked very same to that error code. I copied your code.
Could you provide your build script? CI hvae tested the build on windows, so the error seems a bit strange to me. Here is a successful build for your reference: https://github.com/apache/incubator-mxnet/blob/master/ci/build_windows.py . cc @vexilligera
cmake -G"Visual Studio 15 2017 Win64" -Thost=x64 -DUSE_OLDCMAKECUDA=0 -DUSE_SPLIT_ARCH_DLL=0 -DUSE_CUDA=1 -DUSE_CUDNN=1 -DUSE_NVRTC=1 -DUSE_MKL_IF_AVAILABLE=0 -DUSE_MKLDNN=0 -DUSE_F16C=0 -DBLAS="MKL" -DCMAKE_CXX_FLAGS="/FS /MD /O2 /Ob2" -DMKL_INCLUDE_DIR="C:/Program Files (x86)/IntelSWTools/compilers_and_libraries/windows/mkl/include" -DMKL_RT_LIBRARY="C:/Program Files (x86)/IntelSWTools/compilers_and_libraries/windows/mkl/lib/intel64/mkl_rt.lib" -DOpenCV_DIR="E:/proj/dev/mxcproj/deps/opencv4/build/x64/vc15/lib" -DCMAKE_CUDA_COMPILER="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe" -DCUDA_TOOLKIT_ROOT_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1" -DCMAKE_PREFIX_PATH="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/lib/x64" -DCUDNN_INCLUDE="E:/proj/dev/cudnn/include" -DCUDNN_LIBRARY="E:/proj/dev/cudnn/lib/x64/cudnn.lib" -DMXNET_CUDA_ARCH="6.1 7.5" -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX="install" ..
@chinakook Is libmxnet.so or incubator-mxnet/ placed in some path with names containing unicode like chinese characters?
@leezu I agree that it would be better if the C API always uses a fixed unicode encoding, like utf-8. But as far as I know, C string without any prefix like u8 always has a platform-dependent encoding (referred to as execution narrow encoding [1]). I am not an expert in this area, correct me if I'm wrong.
[1] https://en.cppreference.com/w/cpp/language/string_literal
@hzfan , pure English