static-marks icon indicating copy to clipboard operation
static-marks copied to clipboard

Please support Chinese encoding

Open lvzhenbo opened this issue 2 years ago • 11 comments

Chrome ouput image static-marks output image image

lvzhenbo avatar May 10 '23 17:05 lvzhenbo

After installing the command npm install -g static-marks in a Chinese environment, I encountered the same error. However, it's amazing that after pulling the project and locally compiling it, the bookmarks containing Chinese characters can be handled normally.

image

fnwind avatar May 24 '23 11:05 fnwind

Unfortunately, I can't reproduce it, both in the local project or when using static-marks as a CLI tool. I've tested a bookmark with the name 花火 and it's working fine.

The error from the screenshot states that "null byte is not allowed in input". I can also see that there are some invisible characters in the YAML file for each line, including the first Imported: line. Can you please remove those characters and try again?

To be able to further help out I would require a reproducible example.

darekkay avatar Jun 24 '23 11:06 darekkay

There is a problem with the yml of chrome exported bookmark html to static-marks Example files:https://github.com/lvzhenbo/bookmarks

lvzhenbo avatar Jun 27 '23 09:06 lvzhenbo

Thanks for providing the repository. I did the following:

npx static-marks import bookmarks_2023_6_27.html -o bookmarks.yml
npx static-marks build bookmarks.yml -o bookmarks.html

This worked for me without any problems. The interesting part are the differences between my generated bookmarks.yml (please check my gist) and your bookmarks_2023_6_27. I think there is an issue with the encoding:

  • mine:
Imported:
  - 办公:
      - 我的仪表盘 - TAPD平台: https://www.tapd.cn/my_dashboard/index
      - Projects · Dashboard · GitLab: http://git.kongque510.com/
  • yours:
Imported:
  - 鍔炲叕:
      - 鎴戠殑浠〃鐩?- TAPD骞冲彴: https://www.tapd.cn/my_dashboard/index
      - Projects 路 Dashboard 路 GitLab: http://git.kongque510.com/

If you compare your YML file with the source, you can spot the difference in the characters.

I can think of two places that might be causing the difference:

  1. I am assuming an utf-8 encoding of the source HTML file. Can you please check if the encoding matches on your system?
  2. I am using bookmarks-parser to convert the HTML file into YML. It's possible that the library has encoding problems.

darekkay avatar Jun 27 '23 11:06 darekkay

It's the command, I'm using this command static-marks import .\bookmarks_2023_6_27.html > .\bookmarks_2023_6_27.yml Not -o, but the right pointed bracket in the readme

lvzhenbo avatar Jun 28 '23 01:06 lvzhenbo

When I use > instead of -o, everything still works fine for me, both with the Windows command line and with the Windows Git bash. I assume the result might depend on the system language / encoding. Unfortunately, I would rely on some help (debugging or pull request) from someone who can reproduce the problem.

darekkay avatar Jun 28 '23 07:06 darekkay

Although my system environment is Chinese encoding but as well as support utf-8 or most cases are utf-8, I am not sure what the difference between this pointed bracket and -o, but if the two functions are the same then it is good to use -o directly

lvzhenbo avatar Jun 28 '23 07:06 lvzhenbo

Also according to @ha-na-bi, the clone repository runs without problems, it's the use of npm packages that is problematic

lvzhenbo avatar Jun 28 '23 07:06 lvzhenbo

Just to clarify: does using -o work for you without issues? If yes, then I would only adjust the documentation and mark -o as the "correct" way.

darekkay avatar Jun 28 '23 07:06 darekkay

Yes, I ran it and confirmed again that -o had no coding problems

lvzhenbo avatar Jun 28 '23 07:06 lvzhenbo

I think the issue comes from the >! You are using powershell, so this is an alias to powershell's out-file, and depending on the powershell version, it might not always default to utf-8 encoding, therefore producing something incorrect.

This wouldn't happen in bash/linux, since the > works differently, nor should it be an issue on modern versions of powershell (like v 7.x).

But on older windows versions, the default powershell (aka windows powershell), is v5, and works differently, and is not really oriented on "utf8 by default".

So perhaps the recommendation should be "don't use powershell, or use powershell v7.x+" only. (or use that -o setting, so that the shell is bypassed entirely, yep).

Jiehong avatar Mar 08 '24 10:03 Jiehong

I've updated the documentation and reference back to this issue.

darekkay avatar May 23 '24 16:05 darekkay