kernel_gateway icon indicating copy to clipboard operation
kernel_gateway copied to clipboard

how to handle chinese( utf-8)

Open awenhu opened this issue 8 years ago • 7 comments

I using the kernel gateway in jupyter notebook, when I test the api, cannot display chinese correctly.

awenhu avatar Sep 13 '16 06:09 awenhu

Can you provide an example notebook and state how you're running the kernel gateway so that we can use it to debug and create a test?

parente avatar Sep 13 '16 11:09 parente

@awenhu I'm going to close this issue as inactive. If you can provide a sample notebook to help us debug, please feel free to reopen the issue.

parente avatar Jan 20 '17 12:01 parente

sorry to re-open it, but I've encountered the same issue. Here's my sample notebook content # GET /contacts import json req = json.loads(REQUEST) print(req) print("test") print("测试")

I used command below to start the server:

jupyter-kernelgateway --KernelGatewayApp.api='kernel_gateway.notebook_http' --KernelGatewayApp.seed_uri='api_intro.ipynb'

And when I tested the api with http://127.0.0.1:8889/contacts?name=1 I got image

PetraWang avatar Aug 09 '18 06:08 PetraWang

@PetraWang Could you please provide the notebook as a file attachment? If somebody else copies from your comment and saves it to a new notebook file, all kinds of character encoding conversions might happen along the way.

It could also help if you can determine the unicode code points of the three characters that are actually displayed instead of the two characters in your notebook. Or maybe you can copy&paste them into a comment. We'd have a hard time figuring out the code points from a screenshot.

Where are you running your test? In an IDE, or a Linux shell, or someplace else? Can you tell us something about the character encoding used for printing the output?

rolweber avatar Aug 09 '18 08:08 rolweber

api_intro2.zip Thanks for the reply @rolweber Since github does not support uploading ipynb file, I uploaded the zipped file as attachment.

The characters I wanted to print is '测试' which means 'test' in english, and the outcome was '娴嬭瘯' which I cannot get the meaning.

I was running my test on Windows10, Python3.6. I started the jupyter-kernelgateway via Windows PowerShell. I`m not sure whether that matters.

Thanks

PetraWang avatar Aug 20 '18 05:08 PetraWang

Thanks @PetraWang, zipping the file was the best approach to attach it :-)

Character encodings are a tricky thing. Assuming the characters are encoded correctly in your ipynb file (which I didn't check yet, for lack of time):

  1. The ipynb file is loaded by kernel gateway into RAM.
  2. The characters are sent in the response to your request. (server side)
  3. The characters are received with the response to your request. (client side)
  4. The characters are printed in your shell.

Each of these steps might apply an incorrect encoding, leading to garbled characters.

Could you do some more debugging on the client side? It would be interesting to know:

  • The HTTP headers returned in the response to your request.
    In particular the Content-Type and Content-Length headers, if they are present.
  • A hexdump of the response body.
    This eliminates step 4 and provides an exact representation of the binary response data.

rolweber avatar Aug 20 '18 06:08 rolweber

Thanks @PetraWang, zipping the file was the best approach to attach it :-)

Character encodings are a tricky thing. Assuming the characters are encoded correctly in your ipynb file (which I didn't check yet, for lack of time):

  1. The ipynb file is loaded by kernel gateway into RAM.
  2. The characters are sent in the response to your request. (server side)
  3. The characters are received with the response to your request. (client side)
  4. The characters are printed in your shell.

Each of these steps might apply an incorrect encoding, leading to garbled characters.

Could you do some more debugging on the client side? It would be interesting to know:

  • The HTTP headers returned in the response to your request. In particular the Content-Type and Content-Length headers, if they are present.
  • A hexdump of the response body. This eliminates step 4 and provides an exact representation of the binary response data.

It turns out that we built the docker image withou setting right locale…

LukeWang163 avatar Mar 21 '20 05:03 LukeWang163

Looks like this was ultimately resolved - user config issue - closing.

It turns out that we built the docker image withou setting right locale…

kevin-bates avatar Feb 07 '23 22:02 kevin-bates