node-packer icon indicating copy to clipboard operation
node-packer copied to clipboard

Support arbitrary node.js runtime versions

Open lexor90 opened this issue 7 years ago • 18 comments

Hello, as discussed already in other issues I think nodec should be able to compile using a user defined node version among whose supported by the applied patches.

I didn't have any chance to test the patches with the v6.x, v7.x and v8.x but since those were versions supported by nodec itself I guess we should have some sort of support for them (by the way I'd really like to add some tests against regressions).

Patches have evolved while also upgrading the supported node version, so I think we should still be able to verify the patches apply properly and that node.js retains its functionalities after those patches are applied.

My purpose is to:

  • add a --node-version=7.0.0 flag to the nodec binary (or --runtime, but I think node-version is more user-friendly) and keep the latest nodejs version (or maybe the LTS) as the default
  • alter the ruby script (compiler) to
    • download source from the original nodejs.org website (https://nodejs.org/download/release/)
    • checksum the downloaded tar file to guard against tampering or network errors since the build will be used in production
    • extract the source code
    • apply the patches to node core (by using the system diff or some ruby gem)
    • ideally, check there was no regression after patching (e.g. run the nodejs test suite)
    • compile the node project

I'd really like to avoid what other similar tools do that is to download the detected (current-system) node version: it's ugly, annoying for nvm users, error prone and furthermore in a CI/CD environment one may need to skip this (and using nvm to workaround this might get complicated as nvm is hard to be used in some shell scripts). Also I don't think there's really some logical reason one should assume that I want my local node version on my servers.

I already started to work and implemented the flag and the download part, but I'm not sure how would you like to distribute the patch: do you want to place a .patch file inside this repo? I'd avoid to keep the expanded source code under node/ because it could lead to unwanted patching (e.g. files modified from one version to another would be modified, and we would need to declare what files to pick explicitly, which may still get unwanted modifications.

I'd like to know your thoughts about it @pmq20

Thank you.

lexor90 avatar Jun 06 '17 09:06 lexor90

@lexor90 Thank you very much for the detailed proposal. This is indeed a feature that has been raised a lot of times, the most recent one being https://github.com/pmq20/node-compiler/issues/27#issuecomment-306368154

Assuming local node version as the default does have some advantage though, when it comes to compiling native modules. Since C++ native modules are built in the local environment, the ABI versions should really match what Node.js version is being use at compile time and that is going to be used after compiled. In the future 8.x releases N-API might help mitigate the issue but for Node v6 and v7 it is quite relevant.

I really appreciate your work into implementing it. If you need any help please do not hesitate to ask me. I have added this to the 1.x roadmap here: https://github.com/pmq20/node-compiler/blob/master/ROADMAP.md

pmq20 avatar Jun 07 '17 04:06 pmq20

@lexor90 I have generated the current patch file at https://github.com/pmq20/node-compiler/blob/master/diff.erb

Note though that this diff file will go out of date as we commit new changes. We could write an automated script to generate that file to keep it in sync.

Regarding the erb extension, I am thinking that different Node.js versions might require slightly different diff, which could be dynamically changed using erb file format to adjust to those individual Node.js versions.

Moving ahead I think we could remove the node/ directory from the repository and only commit that diff file.

pmq20 avatar Jun 12 '17 03:06 pmq20

@pmq20 Thank you for the reply and sorry for the delay. Have had some work to get done.

We need to be very carefully at generating diffs, because should we remove the node/ folder we also need some way to get a consensus about which version to develop on. This is needed because if we do develop new features on a v8.x release but wrongly try to generate a diff on a v6.x we also risk to move non-nodec related lines of code between node versions which is dangerous (admitted it would even compile properly).

This means we cannot directly produce a patch compating the node/ folder and any other node version (e.g. user defined one), but we need to extract it against the same version.

With that in mind I think we need to develop a resilient way to avoid human error and keep the diff up to date on a per-commit release. Thinking about a git pre-commit hook if it's fine for you.

I'm just going to produce an example.

I like the idea of the erb solution to keep a single diff, if it's applicable.

By the way I think we need to state what to do when this implementation will go live: do we want to keep adding new features (and bugfixes) on all node versions supported by this work (porting them backward)? Or we do want to commit new changes only against the current and future ones? what do you think about it @pmq20 ?

Because we're facing the "issue" that while this project improves also node.js itself does.

Thanks.

lexor90 avatar Jun 13 '17 14:06 lexor90

The original implementation I'm testing out makes use of the diff and patch commands but it requires

  1. to have them available (which may be not in an environment != *nix-like, neither on a restricted/sandboxed environment, such as a CI), and
  2. to download the same version from the node.js servers (a clean node source code which has not been edited)

Pros:

  • easily generate full diff

Cons:

  • probably not working on Win platforms (due to the requirements)
  • inefficient for both networking (by using caching it's viable though, unless CI removes temp dirs. This assumes we don't update daily our node version - since it's not released daily, btw. So this would only occur once from time to time)
  • and time-inefficient (running the node.js testing suite is slow, but we can workaround this by choosing what to run @pmq20 you can surely know what we need to run as "impacted tests")

Considerations: I think that to make it easier to understand we need to make a distinction between patch and diff phases and when we need to run them.

We need to generate the diff on each commit that impacts the node/ folder, but having to wait for the testing suite to be done is time consuming (really, node tests are thousands – fortunately – and it's time consuming to run them).

Rather than running the testing suite on each commit, we could instead opt to only generate the diff and then verify the correctness of the patch using the CI/CD you have setup already (after all, users are expected to run only a release version). (why we need to run the tests after generating a patch? to verify the correctness of the patch itself. We generate a patch between the modified and original node – same versions – and we then could apply the patch to the new version and run the test suite. If it passes, it's all good, and we can trust the produced patch).

Also, probably nobody of us will develop nodec itself on Windows, so the cons n.1 may be a non-impacting issue, but we definitely need to make sure the patch part works there, because it's what will be required by users running nodec.

The patch part is required to apply the patch to the picked node version, after patch we need to run the testing suite again (we could also do this ahead of time, on our CI environment after a push, but we need to test a lot of MAJOR.MINOR.PATCH which may not be viable using an opensource CI). The patch part will also need to download sources from network (but we cannot do any other way in this case). (why do we run tests again since we run them before? because we are now applying the patch to another version, defined by the user, and we need to make sure it's compatibile and does work for this arbitrary node version. Maybe it compiles but we're breaking something, and you don't want to deploy it without verifying, do you?)

ALTERNATIVE:

I also worked to get this done without having to download a clean node version for the diff part, by using git format-patch. By using it we can generate the patch using the git history (that is: locally) so we can reference a commit (the latest one) where we pulled in a plain node version, and then generate the diff against HEAD.

But I could verify this is not as consistent as the first solution because git format-patch f924921eefa02e711bd6087697ca22b486c86169 --stdout node/ does not produce the same as diff.erb (probably when updating to node 8.1.0 you didn't start again from scratch, but you rather kept some modifications). @pmq20 can you confirm this assumption?

This is faster at generating a diff but definitely more error prone as described above. It fixes the networking issue above for the diff part, but not the tests one (because of course we need to run tests as described above: on CI, on Commit, on the final-user machine, as per our decision). This definitely requires node/ folder to be checked in.

@pmq20 I hope I have been able to be clear about it, btw what do you think about it? What solution does it appeal more to you? Put it like you want, we need to checkout the same version, free from any edit, and compare it against the changes.

Then we can try to apply the patch to the LTS or to the v7.x (i'd skip prev versions) and run tests. The more we want to cover, the more work it has to be done to verify the patch applies without issues.

lexor90 avatar Jun 13 '17 15:06 lexor90

@lexor90 Thanks for the feedback. After reading your comment, I am beginning to hate the idea of using a single unified diff with a lot of erb magic in it, since we have to maintain a lot of Node.js versions.

I just reviewed the upstream active branches ( https://github.com/nodejs/node/branches/active ) and listed the Node.js versions that are currently maintained:

  • v8.x
  • v7.x
  • v6.x
  • v4.x

Therefore, I think a good way to move forward is to keep exactly 4 diff files, and only test against the latest release of each release line, i.e. (for the moment)

  • v8.1.1
  • v7.10.0
  • v6.11.0
  • v4.8.3

For older versions in each release line, we generate a warning at compile time to warn the user that it is not guaranteed to work but the compiler will try its best (at the user's own risk), and recommend upgrading to the latest version of that particular release line (e.g. from v6.10.0 to v6.11.0).

For the ease of our development, I'll create 4 separate branches, where node/ folder will reside with its full content (comparing with the master branch, which contains no node/ folder and only possess 4 diff files. I'll write scripts to generate diff files automatically from those branches. ), i.e.

  • dev-v8.x
  • dev-v7.x
  • dev-v6.x
  • dev-v4.x

pmq20 avatar Jun 14 '17 03:06 pmq20

@lexor90 After reading your second comment, I feel it's becoming overwhelmingly complicated and seems we should simplify our original goal a little bit. I think we need to ask ourselves the following 2 questions,

  • Do we need support all legacy versions?
  • Do we need to support, say, v6.5.0 when v6.x line already has v6.11.0 released?

I observed that (based on the active branch list of my last comment), even the upstream Node.js team does not have that kind of granularity of back-supporting that many versions. They seem to be only supporting one major version and forgot about minor versions.

If our answer to the above questions is "no" and we are only ready to support the latest release of each line, then the problem could be greatly simplified. None of source-downloading, patch-generating, diff-applying are needed. We just need to keep more node/ folders in our master branch, i.e.

  • node4/
  • node6/
  • node7/
  • node8/

Then add a --node-version option that only accepts 4 or 6 or 7 or 8. And we release a new version of nodec once a minor version of those lines has been released.

@lexor90 Do you think we could make that simplification?

pmq20 avatar Jun 14 '17 03:06 pmq20

Look how cool that is 😆

image

pmq20 avatar Jun 14 '17 07:06 pmq20

Maybe we should follow nodejs lts https://github.com/nodejs/lts

fengmk2 avatar Jun 14 '17 07:06 fengmk2

@fengmk2 Good point. I think we should drop 7 according to this. Just Argon, Boron and Carbon 🎉

..., yet we have to be prepared for node 9 though, which will come into being on October this year.

pmq20 avatar Jun 14 '17 07:06 pmq20

Appveyor supports matrix CI, too. ~~However that might exceed the one-hour build time limit since there is no concurrency. I'm still waiting for the result. I'll split projects if it does.~~ Turns out there's no problem at all.

image

pmq20 avatar Jun 14 '17 07:06 pmq20

I've applied the current nodec patch to each release line, let's watch how CI goes for them,

  • [master 1ca4abf9] apply nodec patch to node4
  • [master 0e4524a] apply nodec patch to node6
  • [master 05c96b0] apply nodec patch to node

Note that, node/ folder contains a copy of the bleeding-edge node.js code from upstream. That is not part of the CI ( or CI with failures allowed ), though.

Moving forward my thinking is that we still do development mainly on the node/ folder, while adding a pre-commit git hook so that every change to node/ will be propagated to node8/, node6/ and node4/ automatically.

pmq20 avatar Jun 14 '17 08:06 pmq20

Ehy there, nice to read you've already setup the CI.

I don't know how should we evaluate your purpose to pick only the latest minor/patch release for each release line (e.g. v6.x, v7.x, etc). Theoretically it should be safe for small developers, because it also enforces to pick the latest release for each major version (which is good because you get all the optimizations and security patches), but I'm not sure about big projects/corporate.

By the way this can be a safe compromise, since I don't expect any "major" change between node v6.11.0 and 6.10.0 in their source code (maybe some refactoring, some patch, or some function more, but not something completely different). We need to test it out but I definitely think we might have good chances of having a patch that's applying on v6.11.0 to apply properly also on previous versions in the v6 line.

The LTS is fine, but some companies do prefer to use the latest version to start a new project so we cannot just sit on the LTS IMO (as newer versions also come with newer npm and v8 features, which is what most users want).

I guess that having support for the active releases is fine. Don't know how much code has changed in node since v4, but should it be too much different we can safely skip it I think. Nobody rather than legacy applications should be using it, and I doubt anybody will pick a legacy app and compile it right now.

lexor90 avatar Jun 14 '17 09:06 lexor90

@pmq20 well, if we put all node versions in our repo we fix the networking issue at runtime, but I don't think I'd put it inside our repo. Because then we have to update and maintain 4 different node source code.

I made some thoughts about the networking issue: almost any node project uses npm packages (yarn/npm, no matter) but CI servers for a node project hardly can be offline. I'd work on the latest version, check manually (the first time) that the patch applies to previous versions, and try to run tests manually (we still need to discover what's the minimum test suite to run to cover our changes) and if everything goes fine, we can automatize the process on our CI. The specific node version can be downloaded on compilation time, this way we can always pick the latest one for each major version and we don't have to think about updating this repo for each new node release (it might be distracting from other things we need to get done). We can keep the development node version checked in, so we avoid having different node versions or to develop tools that ensure we're all on the same version.

lexor90 avatar Jun 14 '17 09:06 lexor90

@pmq20 I'm going to create a branch to make this changes, so that you'll be able to test them out:

  1. remove node versions rather than the current one
  2. compiler will take an additional argument --node-version which will verify the major version against the node.js website so we're sure we can proceed
  3. download the current version (vanilla) to extract a patch at runtime
  4. (possibly concurrently to n.3) download the user selected node version
  5. patch the user selected node version (with path obtained in n.3)
  6. run tests (maybe I'll skip this for now)
  7. compile

So we have a starting point to work on. Then I'm sure I'll be working on optimizing the n.3 so we cut off some network bytes to download the project twice. In future we might also develop the patch on push/commit, this way it has not to be done at compile-time (at the end of the day, the most important part is the compilation-part, not our CI/development time, which surely must be optimized, but I think it's much more important to be efficient on compilation for end-users). Don't you think?

@pmq20 can you check here what should we test in your opinion? This way the n.6 will be efficient.

I've skipped the test part after for the n.3 (to test the patch extracted applies to the same node version) because it would be duplicated work, since we already have a CI that verifies our commits do not break anything.

lexor90 avatar Jun 14 '17 09:06 lexor90

@lexor90 Interesting. I like the idea on minimizing our amount of work on maintenance. The current way of having so many node sources in one repo does not seem last for a long period of time since it requires too much maintenance.

I agree with you on the idea of downloading source and applying patches. However we are also faced with the issue of difference between numerous Node.js versions.

I think we could try applying those patches dynamically at compile time by the Ruby scripts. That way we can cater to specific needs of particular Node.js versions by writing some if, and also solving the missing patch tool dependency issue that you previously mentioned. After all the patch overall is not that big so writing a Ruby script to cover them shouldn't be big problem. By applying patches to v4 and v6 today I have already observed some differences between those major releases, and they are not too big.

Overall I like your original idea of dynamically downloading source and dynamically applying patches, rather than statically maintaining source code in our repo which gives us too much burden. I'll implement it tomorrow. If you have time you could try it as well, feel free to delete my code :)

pmq20 avatar Jun 14 '17 09:06 pmq20

@lexor90 Wow just saw your third comment after replying my previous one 😄 It offers a better idea than mine. Generating the patch dynamically at runtime sounds very cool, that further minimizes our burden by eliminating the work of maintaining the patch itself.

My previous idea of applying patch via Ruby script also have another problem of not being idempotent and requires a hell lot of code to write. So I'll drop it and prefer your idea.

I think running tests should be made optional with another option like --run-tests, some of those tests require additional tools to run correctly (like requiring grep to be found on Windows, which must be pre-installed as another step), so it should be of low-priority on user's side.

Also in order to reduce the time spent on the first run we could deliver the source code like we previously did. That way only one new download is needed.

Please implement it as the way you see fit and I'll delete the node6/ node4/ node8/ folders that I introduced today soon, since they are not maintain-able. Thank you for your work! 👍

pmq20 avatar Jun 14 '17 10:06 pmq20

Eager for this capability to land! :+1:

JamesMGreene avatar Jun 15 '17 15:06 JamesMGreene

Hello there, just an update. I'm working on this right now, as soon as I get it to work I'll push so you can test it out. I'm going a bit slower than I wanted these days since I have some work to get done.

lexor90 avatar Jun 19 '17 13:06 lexor90