node-gyp
node-gyp copied to clipboard
Add support for Unicode characters in paths
Checklist
- [x]
npm install && npm testpasses - [x] tests are included
- [x] commit message follows commit guidelines
Description of change
On Windows 10 i discovered an issue that if there are Unicode letters or symbols in path for Python, find-python.js script can't find it. This bug is caused by an encoding issue which (i hope) I have fixed. At least this bug fix works for me.
Have changed:
- modified: gyp/pylib/gyp/easy_xml.py
- Python 2.7 handle xml_string in the wrong way because of encoding (line 128)
- modified: gyp/pylib/gyp/input.py
- if "encoding:utf8" on line 240 is not marked it causes an error
- new file: lib/find-python-script.py
- I have created this file for convenience.
- Script which can handle Unicode characters letters and symbols can't fit on one line because it is necessary to specify instructions for several versions of python
- modified: lib/find-python.js
- to make js understand Python (line 249)
Please do not request reviews until all tests are green.
Please do not request reviews until all tests are green. @cclauss
At first, can i edit tests?
At second, in my opinion, they are looking not good becouse find-python.js script is not looking good (may be i dont know something).
and in therd, i just dont know how to test it (as previos author). In such way i am fixing complex bug, complex becouse it is formed inside python it self... may be in python in symbiosis with windows. in this case we cant use unit tests. we should use something bigger or more complex...
the bug is in that python when called from, another process (in our case node.js) doesn't specifing default encodng for stds streams to utf8. then we just reconfiguring these streams to right encoding. and in case of python 2.7 it even return sys.executable in wrong encoding.
in general, i dont know how to test it and what to do next.
Now work in progress. I will publish commits soon.
@cclauss what about code review and merging?
I'd like to discuss about:
1. garbage in terminal
the problem is that when on Windows cmd user's locale codepage is setted, windows provides error messages in locale language and in locale encoding (e.g. code page 866). Node js cant cover this behavior by default
I've suggested some solutions in comments in find-python. Here they are:
-
leave it as is and just warn the user that it should use utf8 (already done in this.catchError's info statement)
-
somehow determine the user's terminal encoding and use utils.TextDecoder with the raw buffer from execFile. Requires to correct error.message because garbage persists there
-
Force the user's terminal to use utf8 encoding via e.g. run "chcp 65001". May break user's programs
-
use "cmd" command with flag "/U" and "/C" (for more informatiom run "help cmd") which "Causes the output of internal commands ... to be Unicode" (utf16le) note: find-python-script.py already send output in utf8 then may become necessary to reencode string with Buffer.from(stderr).toString() or something similar (if needed) for this solution all execFile call should look like execFile("cmd", ["/U", "/C", command to run, arg1, arg2, ...])
I realy don't know which one we should use 🙂
2. Python version checking code
I stuck to philosophy that all error should be handled in one place in scope of class. And again i don't know is this good or not? In my opinion this solution provides more clean and uderstandable code. But I left the version check code in execFilePath with minimal changes, so the logging and error handlers associated with this method are preserved there.
@richardlau @cclauss may i request code review?
@cclauss, @richardlau please, review my code and close/merge this pull request 🙂
@DeeDeeG can u pls review my code related to find-python.js file) merge conflicts will be resolved soon)
This looks like a huge effort re-writing this entire file! I can confirm this does fix the issue for me on Windows with Unicode characters in the Python path.
[EDIT: I forgot to mention: Python 2 support has been removed from the latest node-gyp master branch, so any parts of this Pull request making things work with Python 2 can safely be removed, which I think would simplify things a bit.]
[Also forgot to mention: Any changes in gyp/ would need to be added in the gyp-next repository too, or else they will be overwritten the next time the code is synced from that repository. Or else this repository needs a way to maintain node-gyp specific patched for gyp. If changes in gyp/ can be avoided, it would make merging this PR somewhat easier I think.]
I have some more thoughts and opinions, although I am not an official part of this project, so I defer to the actual maintainers. Their opinions are what really matters.
My thoughts, but official maintainers might feel differently or have a different opinion, and I defer to their judgment...
I do personally wish the total re-write of lib/find-python.js was in a separate pull request from the bug fix changes. And likewise the tests being totally re-written makes it somewhat harder to review. Though last I heard there is an intent to modernize the code-base of this project in general.
I somewhat prefer the old verbose logging output text, considering that the new output is longer and more technical-looking. If the previous text could be used, I would prefer that somewhat. Although it is cool to have the output be colorized.
(On another note, though I suppose this isn't a hugely important thing: do we need TypeScript/JSDoc comments? I have never used these before, but I find them hard to ignore and a bit distracting when looking over the code outside of an IDE.)
Again, this looks like it took a lot of effort, and the bug is fixed on this branch, great job!
@DeeDeeG thanks for your feedback. I am really glad that fix is really working.
As u suggested a have removed python 2 support.
Unfortunatly changes in gyp/ can't be avoided becouse it is part of fix.
Also i can't make separate pull request for lib/find-python.js becouse of that:
fix is pretty simple but breaks all tests, which are required by pull request politics. My attempts to fix them have failed, because of weird errors and unreasonably complex architecture of find-python.js. lol if u want i can send u picture with my vision of old architecture (but it is pretty scary and may be dangerouse to see, haha). Literally any changes that i tried to make led to undefined properties. Thats why i had to rewrite whole find-pithon.js and test for it.
Log ouput
a matter of taste, but it would be really cool to have opportunity to choose log output formate beetwen my way and the old one. I decided to make more machine-like output, because it contains more useful information in fewer words My point of view is to make user choose between these ones.
JSDoc comments
They are really improve developer experience especialy with intelliSence and also containe docs. I think it is pretty important to preserve them even if they look ugly in non IDE environment.
is is acceptable to print path environment variable as array with caution:
!!!! ATTENTION !!!! THIS IS NOT A REAL ARRAY!!!
IS IS FORMATTED AS ARRAY TO BEAUTIFUL OUTPUT!!!
IN PRACTICE IT IS STRING OF VALUES SEPARATED BY SEMICOLONS
the result is:
"Path": [
"warn message from above here",
"var one",
"var two",
"etc...",
]
without this trick the one is displayed as very long string which is absolutely unreadable
Also it would be really nice to validate python version before try to execute command. If we find unsupported version we can tell about this to user, than the one will not see errors like:
ERR! find Python STDERR: Traceback (most recent call last):
ERR! find Python File "D:\path\lib\find-python-script.py", line 3, in <module>
ERR! find Python sys.stdout.reconfigure(encoding='utf-8')
ERR! find Python AttributeError: 'file' object has no attribute 'reconfigure'
End user may not know that such feature is not supported by python installed on user's machine. User would think: why programm is not working thogh i have python installed? the error message doesn't make sence.
I described myself, lol) it what i would say if see such message.
pre-task version check would supply user with meaningful informations e.g. that the one's python is not supported.
By the way, it would be cool to move whole node-gyp to typescript
By the way, it would be cool to move whole node-gyp to typescript
Yeah man, esp. the Python parts. ;-) 
By the way, it would be cool to move whole node-gyp to typescript
Yeah man, esp. the Python parts. ;-)
Lol 😃. I mean at least js part of node-gyp. Write in plane js is a bit painfully)

this is related to https://github.com/nodejs/node-gyp/pull/2254#issuecomment-818991881
if warn user with such message, would it be good enought, or i should change anything?
also in text form:
ERR! find Python --------------------------------------------
ERR! find Python checking if "python" can be used
ERR! find Python Prevalidate python version
ERR! find Python versionOutput: Python 2.7.15
ERR! find Python version is 2.7.15 - should be >=3.6.0
ERR! find Python
ERR! find Python THIS VERSION OF PYTHON IS NOT SUPPORTED
ERR! find Python
ERR! find Python Prevalidate fail
ERR! find Python Below can be weird error due to unsupported python version
ERR! find Python FAIL: python
ERR! find Python ERROR: Command failed: "python" "D:\xxx\node-gyp\lib\find-python-script.py"
ERR! find Python Traceback (most recent call last):
ERR! find Python File "D:\xxx\node-gyp\lib\find-python-script.py", line 3, in <module>
ERR! find Python sys.stdout.reconfigure(encoding='utf-8')
ERR! find Python AttributeError: 'file' object has no attribute 'reconfigure'
ERR! find Python STDERR: Traceback (most recent call last):
ERR! find Python File "D:\xxx\node-gyp\lib\find-python-script.py", line 3, in <module>
ERR! find Python sys.stdout.reconfigure(encoding='utf-8')
ERR! find Python AttributeError: 'file' object has no attribute 'reconfigure'
ERR! find Python --------------------------------------------
The above message seems clear enough to me.
@richardlau, @DeeDeeG, @cclauss can some one pls review my code or at least approve to run workflows?
can some one pls review my code or at least approve to run workflows?
thanks)
@owl-from-hogvarts I'm not sure why the tests/CI is not running here. (Reminder: I am not part of the node-gyp team, so I personally can't fix CI here.) But you can run the tests at your fork if you want. Just enable the CI actions here: https://github.com/owl-from-hogvarts/node-gyp/settings/actions (and you might need to push a commit after the workflows are enabled, to trigger a new CI run.)
Edit: Okay, someone appears to have triggered the CI manually, after being notified of these comments.
@DeeDeeG thank you for smart idea)
@rvagg hi) can you pls review this PR and if possible, merge it)
@owl-from-hogvarts hmm, there is a merge conflict from my PRs. Sorry about that.
I believe this pull request has changed lib/find-python.js to have Windows line endings (CRLF). And this PR changed/added some other files with mixed or Windows (CRLF) line endings. Can you please change them all back to Unix line endings (LF), for consistency with the rest of the repository? Or I can help with that, by opening a PR at your fork to change them back.
Using Unix line endings (LF) solves the merge conflict.
@DeeDeeG now i can't see any merge conflicts. In lib i have changed endings to unix type
@owl-from-hogvarts this branch has overwritten/undone the change I made in https://github.com/nodejs/node-gyp/pull/2375.
I suppose you can re-apply it by doing git cherry-pick -x fca4795512c67dc8420aaa0d913b5b89a4b147f3.
Note about merge conflicts (click to expand if you want)
(I guess I should clarify briefly: All of this is not really relevant to this PR right now. The git cherry-pick command above would solve things for this PR at the moment.)
Maybe this doesn't need to be said, and maybe it was just an oversight. Anyhow, I say this to give a friendly hint:
When encountering merge conflicts during a git merge or a git rebase, it is good to carefully examine the conflicting code from both branches, and try to include the logical changes from both branches into the final result that you resolve. (And once you are done resolving the conflict, git add/git commit the file with the resolved changes.) (A generic guide, regardless of your text editor or IDE: https://www.git-tower.com/learn/git/faq/solve-merge-conflicts/)
I personally like to resolve complex merge conflicts with the Atom text editor: https://flight-manual.atom.io/using-atom/sections/github-package/#resolve-conflicts, but I have found that the interface there shows what's going on so clearly, over time I have become familiar enough with the process that I have become comfortable doing this with a basic text editor like Notepad or nano if I need to. I think Atom's merge conflict interface is a good learning tool, not just a quick solution for the problem.
It is probably too late to resolve the merge conflict the usual way on this branch, as the "resolution" of the conflicts is baked into the rebased commits now. It would require some complex interactive rebasing (git rebase -i origin/master), but that is a bit of an advanced topic... https://about.gitlab.com/blog/2020/11/23/keep-git-history-clean-with-interactive-rebase/ and not recommended to try unless you are bored or curious, and interested in learning more command-line git usage. And if you do try it, please keep a backup of your current branch in a good state: git branch pr_2254_backup master. (See: another resource: http://think-like-a-git.net/sections/experimenting-with-git/branches-as-savepoints.html).
Sorry this got really long. I dunno, I am a nerd about git.
@DeeDeeG i have done this!!! your changes are picked up. No merge conflicts
https://github.com/nodejs/node-gyp/pull/2254/checks?check_run_id=2634323722#step:9:26 https://github.com/nodejs/node-gyp/pull/2254/checks?check_run_id=2634323375#step:9:44
@DeeDeeG @cclauss as i understand, tests are not passing due to timeout. So does this mean that node-gyp works too slow? or it may be an error internally with some kind of infinite loop or never ending awaiting or so?
Thank you very much)
In fact in js part only one line change is strictly necessary, but that one line breaks all old tests. I think i can use new tests but with old source of find-python.js. But i recommend to preserve all changes because in my opinion (which is most likely biased) it contain important improvments on extensability and readability.
My strong recommendation would be to move all improvments on extensability and readability to a separate PR called improvments on extensability and readability so that we can get this PR reviewed and landed without a lot of unrelated changes.
@cclauss i have folowed your advice: now node-gyp use old find-python.js but new tests because old are broken for this changes
@rvagg @DeeDeeG @joaocgreis PLEASE review this PR!