megalinter icon indicating copy to clipboard operation
megalinter copied to clipboard

format-tables call hanging when running build.sh

Open omusavi opened this issue 1 year ago • 18 comments

Describe the bug Attempting to setup a development environment to contribute, when running bash build.sh to validate the install, the call to format-tables.sh hangs every time. This occurs in a devcontainer, directly on WSL2, as well as in Github Codespaces

To Reproduce Steps to reproduce the behavior:

  1. Clone main (See Additional Context for potential branch to clone)
  2. Run bash build.sh

Expected behavior Successfully exits

Actual Behavior Bash script hangs with no output. I added a 30s timeout to the subprocess.run call and it times out every time, however I do not know why. I tried setting the CI variable to see if that would be a difference and it does not help.

Additional context The current devcontainer files in the repo use the main dockerfile image to try and use that as a development environment, but since the image is so huge this takes forever to load and frequently is rate limited from github. I was actually in the process of updating the devcontainer files to use a python image with just the required tooling to make it easier to developer features quickly using Github Codespaces. You can clone the repo and repro the error from there, but just as a note that this issue does not seem to be unique to the devcontainers as it also occurred locally without the container.

Repo link: https://github.com/omusavi/megalinter/tree/devcontainer

Alternative repro steps:

  1. Go to https://github.com/omusavi/megalinter
  2. Click on Code button, select Codespaces and create a codespace on the devcontainer branch
  3. When devcontainer is up, run bash build.sh

omusavi avatar Sep 21 '22 18:09 omusavi

Having a ready to use dev environment on CodeSpaces would increase contributions , that would be excellent !

Is it possible to run CodeSpaces on an already built docker image ? oxsecurity/megalinteer:beta would be a good candidate (2.2GB, is realistic to want to use it ?)

About the format table, you're not the first to have the issue with Linux... and I only code on a windows env so I can't reproduce, except maybe on CodeSpaces , i'll let you know :)

nvuillam avatar Sep 21 '22 18:09 nvuillam

image

I can reproduce, that's a start...

And same if i install markdown-table-formatter beore

nvuillam avatar Sep 21 '22 19:09 nvuillam

This problem is getting me crazy....

Basically, we just need to call markdown-table-formatter@latest --verbose './**/*.md' at the end of the build , and that must work on any config (I used npx to avoid to force to install markdown-table-formatter, but I'm open to any other way)

If you find a way you're a hero ^^

nvuillam avatar Sep 21 '22 19:09 nvuillam

I see that node.js in on v14 , >= v16 would be recommended and the --yes may be understood :)

nvuillam avatar Sep 21 '22 19:09 nvuillam

Oh I thought I installed v16 on that container, can look into upgrading it. Will play around a bit more today as well to see if the megalinter:beta image works well as a base image

omusavi avatar Sep 21 '22 19:09 omusavi

FWIW using the megalinter:beta image as the base image did not fix the issue. It still did take a long time to download the image, so I personally prefer the leaner python images, but I can potentially make that change if we decide to go in that direction. For reference, the updated Dockerfile and scripts necessary to get it working can be found in this branch: https://github.com/omusavi/megalinter/tree/devcontainer-megalinter-base

omusavi avatar Sep 21 '22 20:09 omusavi

I encountered the same problem last time I had the chance to work on #1553. That weekend I ended up using GitPod in order to have a real Linux environnement (and to use PyCharm to debug, I like the Python debugger way more).

I investigated and found that it was the call itself that was failing, it wasn't hanging. I had to play with the stderr and stdout to be able to see it if I remember well, or else it was by inspecting the process object in the debugger that I found the clue to this.

For the solution (see the commit linked below), I think it is the smallest and safest solution for this. I know it worked locally (on a GitPod instance), and the CI didn't complain, I hope it ran that section. (The CI would have run it the original way, with the alpine container using root and /bin/bash.) I remember searching the repo's history and issues to understand the special treatment for Windows, but nothing was really mentioned and was along other changes. I tried to have a working setup on Windows to test before and after the change, (and without the current added workaround), but a Windows environnement that runs a Python script that calls a Linux bash file (or Linux tools) is quite unexpected, and I can't find another bash than git bash that can run as Windows (WSL would run as Linux). I couldn't get the git bash to run correctly (tool-wise)

https://github.com/echoix/megalinter/commit/e0dcd492c48fe8052ffbd34416a1d7032bec8b42

This is in my branch for the ARM64 builds, (I still have some unpushed commits locally), It wasn't my main goal at that time, so I didn't divert too much and didn't make a PR yet, in case it was just my setup that was unusual, or that it wasn't needed at the end. If it would still be needed at the end, I would have made a separate PR and rebased.

I think a cherry pick of that commit, keeping the explanation and author would be the best operation to integrate the fix in the main branch. It will be a conflict to fix in my branch later on with any way that I can think of.

echoix avatar Sep 21 '22 21:09 echoix

@echoix that seems great n thanks for the tip :)

Unfortunately I tried it on windows and it did not work.... so I made a slight update to your commit, and now I'm crossing fingers... :)

https://github.com/oxsecurity/megalinter/pull/1894/commits/5543bc33c1f5e293118d76a49f3ea32b9d91b4ef

nvuillam avatar Sep 21 '22 21:09 nvuillam

@echoix that seems great n thanks for the tip :)

Unfortunately I tried it on windows and it did not work.... so I made a slight update to your commit, and now I'm crossing fingers... :)

https://github.com/oxsecurity/megalinter/pull/1894/commits/5543bc33c1f5e293118d76a49f3ea32b9d91b4ef

In that case, try using my original commit, but with the which("bash") for windows and Linux, thus eliminating the inlined if. That was my original goal, but since I couldn't test the Windows path, I left it as is. In theory it should work.

echoix avatar Sep 21 '22 21:09 echoix

Before I saw these comments I actually found another solution as well. I don't know why this works, I mostly was doing this change to get better output and hopefully figure out what was happening, and it seemed to fix the issue. Rather than using subprocess.run I just called Popen directly and logged directly to stdout. For whatever reason this seemed to work: https://github.com/omusavi/megalinter/commit/26c536b47bd15c05071bce0d55a8035414e6b7da

I am not a python guy so I am not sure if this fix is good or not, up to you @nvuillam on which direction you want to go. FWIW, @echoix fix worked fine for me in devcontainer!

omusavi avatar Sep 21 '22 21:09 omusavi

@omusavi > Believe it of not, I'm also not a python guy, I never coded anything else than MegaLinter in this language :D

If the mix between @echoix 's update and mine (that i tested on windows) works , it will be in the next beta release in one hour,then u'll be able to merge in your branch and see if it works better ^^

nvuillam avatar Sep 21 '22 21:09 nvuillam

@omusavi please let me know when you have an even half-usable version with CodeSpaces, i'm pretty sure it could help me with this pull request : https://github.com/oxsecurity/megalinter/pull/1877

This will be a great enhancement for MegaLinter to use relative files paths instead of absolute ones... but the CI crashes because of memory on linux and I have no idea where to issue comes from... being able to run individual python test cases on linux will probably help me solve the issue !

nvuillam avatar Sep 21 '22 21:09 nvuillam

I have 3 branches right now, two of which are working with codespaces so you should be able to test using them: devcontainer-popen - the change i mentioned above, but most likely will not PR this devcontainer-no-bash - cherry-picked echoix's commit above, seems to work fine without the windows fix devcontainer - The branch that only contains the devcontainer changes, this is what I plan on PRing once the beta release is done and tested, but right now this branch hangs on the format-tables script!

Feel free to use any of those branches as your base and you should be able to launch codespaces with no issues. I will file the PR for the devcontainer branch now as well, we can always merge it, it wont be any worse than what we have currently!

omusavi avatar Sep 21 '22 22:09 omusavi

Before I saw these comments I actually found another solution as well. I don't know why this works, I mostly was doing this change to get better output and hopefully figure out what was happening, and it seemed to fix the issue. Rather than using subprocess.run I just called Popen directly and logged directly to stdout. For whatever reason this seemed to work: https://github.com/omusavi/megalinter/commit/26c536b47bd15c05071bce0d55a8035414e6b7da

But that doesn't really attack the root cause, present in the current state (and not really fixed in the low-level Popen fix attempt), which is not a Python-specific problem, it's rather a Linux-conceptual one. Hardcoding the bash executable will break some Linux users quite easily, since it is not a assured that the interpreter accessible to an user is placed there. I can imagine the setup of my school's (university) Linux computers where this would have been a problem. Second conceptual problem, calling bash format-tables.sh and also specifying the interpreter to use as /bin/bash means to call bash, in that bash, call bash again, asking to run this script. When calling the second bash, it will create a new interpreter, flushing the configured environment (eg environnement variables), but for users using virtual environments (any of them), they work by setting variables, redirects etc, but this breaks out of it. Calling another interpreter will also rerun the .bashrc and the user's .profile script for example. In GitPod, or any other managed environment where multiple versions of tools could be selected for a user, it might be a problem there.

Sorry, I don't think I'm clear enough in my explanations in this message.

echoix avatar Sep 21 '22 22:09 echoix

@echoix that makes sense, I agree that your fix is probably the more appropriate one. In fact, thats probably why I when I ran the build.sh script just in WSL it kept complaining that the packages didnt exist...because they were in the venv

omusavi avatar Sep 21 '22 22:09 omusavi

@echoix yes it is :) But now with the code in main (since my last PR merge minutes ago) , bash is no longer in the command, so it should be ok ?

https://github.com/oxsecurity/megalinter/blob/c7f75956eabd157d8e2702e5b1b8ac69c9bc1c06/.automation/build.py#L2768

nvuillam avatar Sep 21 '22 22:09 nvuillam

@echoix that makes sense, I agree that your fix is probably the more appropriate one. In fact, thats probably why I when I ran the build.sh script just in WSL it kept complaining that the packages didnt exist...because they were in the venv

Haha, I went through these steps and these errors by myself 19 or 20 days ago...

echoix avatar Sep 21 '22 22:09 echoix

But now with the code in main (since my last PR merge minutes ago) , bash is no longer in the command, so it should be ok ?

https://github.com/oxsecurity/megalinter/blob/c7f75956eabd157d8e2702e5b1b8ac69c9bc1c06/.automation/build.py#L2768

Not really, yes the Linux pathway is a good one, but what I wanted you to try is to not have an platform specific in the command to run, nor in the executable=.

Also, I'd appreciate if at least a co-authored-with was added to the fix I spent hours finding, and maybe that the original commit message (or details) was kept, which is at least more helpful when investigating the changes of the code than "quick Linux fix" jammed inside a generic automatic PR that adds other changes. It's not a good practice to add multiple different fixes inside a single PR (without explanations). The commit could also reference this issue for traceability :)

echoix avatar Sep 21 '22 22:09 echoix

Sorry for chiming in, I believe Ill get excused if I say that I'm not a python guy either :)

Just wanted to share my assumptions about the root cause of the initial "hanging" behavior with following week-old code on Linux:

def reformat_markdown_tables():
    logging.info("Formatting markdown tables...")
    # Call markdown-table-formatter with the list of files
    format_md_tables_command = ["bash", "format-tables.sh"]
    cwd = os.getcwd() + "/.automation"
    logging.info("Running command: " + str(format_md_tables_command) + f" in cwd {cwd}")
    process = subprocess.run(
        format_md_tables_command,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        universal_newlines=True,
        cwd=cwd,
        shell=True,
        executable=None if sys.platform == "win32" else "/bin/bash",
    )
    stdout = utils.decode_utf8(process.stdout)
    logging.info(f"Format table results: ({process.returncode})\n" + stdout)

@echoix was close when said

Second conceptual problem, calling bash format-tables.sh and also specifying the interpreter to use as /bin/bash means to call bash, in that bash, call bash again, asking to run this script.

except the fact that with shell=True and args format_md_tables_command specified as a sequence, format-tables.sh would be threatened as an argument for the executable shell but not the one specified along with it (according to python docs). So during executation of the mentioned code, following effective command would be executed:

/bin/bash -c bash format-tables.sh

where bash is a command string to execute and format-tables.sh argument to the /bin/bash shell but not bash (instead of intended command /bin/bash -c "bash format-tables.sh"). This effective command spawns interactive bash shell and blocks any further execution waiting for the input as there is no such argument format-tables.sh for /bin/bash exist. Using process tree this could be described like this:

/init
\_ bash ./build.sh
    \_ python3 ./.automation/build.py
       \_ bash

or better using bourne shell without executable override

/init
\_ bash ./build.sh
    \_ python3 ./.automation/build.py
       \_ /bin/sh -c bash format-tables.sh
            \_ bash

t3mi avatar Sep 27 '22 13:09 t3mi

Thanks @t3mi, you better explained what I struggled to describe. I didn't master the subject before, but just enough to understand what was going on and I went to find a solution to call the script rather than bash.

There were two things I didn't understand enough to remove completely the explicit shell selection (instead of default): why was it needed in the first place, and what are the particularities of the options in the beginning of that shell script (meaning: what would it do if it wasn't bash but another shell that executed it?)?

echoix avatar Sep 27 '22 17:09 echoix

But now with the code in main (since my last PR merge minutes ago) , bash is no longer in the command, so it should be ok ? https://github.com/oxsecurity/megalinter/blob/c7f75956eabd157d8e2702e5b1b8ac69c9bc1c06/.automation/build.py#L2768

Not really, yes the Linux pathway is a good one, but what I wanted you to try is to not have an platform specific in the command to run, nor in the executable=.

Also, I'd appreciate if at least a co-authored-with was added to the fix I spent hours finding, and maybe that the original commit message (or details) was kept, which is at least more helpful when investigating the changes of the code than "quick Linux fix" jammed inside a generic automatic PR that adds other changes. It's not a good practice to add multiple different fixes inside a single PR (without explanations). The commit could also reference this issue for traceability :)

There are rules, and there are some shortcuts about rules... as I'm the only one to (rarely) take them ,and as MegaLinter CI is as long as my available time is short, I allow myself to take these shortcuts from time to time ^^

You are already identified as a top contributor of MegaLinter, I think that's much more important than being credited in a commit message that nobody will ever read ^^

But if you like I can make a special line about you in the Special thanks section of the documentation :p

nvuillam avatar Sep 27 '22 18:09 nvuillam

@t3mi many thanks for the explanation :)

nvuillam avatar Sep 27 '22 18:09 nvuillam

@echoix look what I've hidden in another not related PR 👼 https://github.com/oxsecurity/megalinter/pull/1911/files#diff-06572a96a58dc510037d5efa622f9bec8519bc1beab13c9f251e97e657a9d4edR17

nvuillam avatar Sep 27 '22 19:09 nvuillam

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

github-actions[bot] avatar Oct 28 '22 01:10 github-actions[bot]