airflow-site icon indicating copy to clipboard operation
airflow-site copied to clipboard

Remove Bash from the process of building the site

Open Bowrna opened this issue 2 years ago • 24 comments

closes: #338

Bowrna avatar Jul 04 '22 09:07 Bowrna

https://github.com/apache/airflow-site/blob/988ae480223f723efb0b2f97992d30f42128a0ef/site.sh#L77-L114

Can someone explain to me what's happening in this code part? i find it difficult to understand what's happening here.

@mik-laj could you help me?

Bowrna avatar Jul 21 '22 08:07 Bowrna

@Bowrna see: https://stackoverflow.com/a/12498485 This generates a relative path for use in a container.

mik-laj avatar Jul 21 '22 08:07 mik-laj

@Bowrna see: https://stackoverflow.com/a/12498485 This generates a relative path for use in a container.

Could you tell me why we need to convert the absolute path into relative path to run the script? Wouldn't it run fine with the absolute path? @mik-laj

Bowrna avatar Jul 22 '22 05:07 Bowrna

Because in container it will be in a different location ("/opt/airlfow/.....") rather than <YOUR LOCAL AIRLFOW SOURCES>/....

potiuk avatar Jul 22 '22 14:07 potiuk

Ah I see @potiuk. As the sh file is replaced to run without docker dependency, i forgot about this point. Thanks

Bowrna avatar Jul 22 '22 15:07 Bowrna

https://github.com/apache/airflow-site/blob/fd7af05318e34f4efd6b8446ba8729112f30ee08/site.sh#L199

Is this redirect still valid even after removing the docker part @mik-laj @potiuk @turbaszek @rossturk Could anyone help me to understand?

I have this doubt because we are redirecting to the path "/docs/${package_name}/stable/index.html". So I wanted to ensure if this is still valid even after removing the docker part for building site. Thanks

Bowrna avatar Jul 24 '22 04:07 Bowrna

Is this redirect still valid even after removing the docker part @mik-laj @potiuk @turbaszek @rossturk

Nope. Not needed. I think it was not even needed before for a long time already. This is likely a remnant of a very old approach - long time defunct.

I think what is important is that after buildng the site locallly, you can start the server and locally browse the site, also a check that after deploying to airflow site it still works should happen, but this is something we can likely do only after we merge it.

potiuk avatar Jul 25 '22 15:07 potiuk

@potiuk I get the following error when building the site and I am not sure how I can solve this. This occurs when hugo command runs under the hood. This happens when running the existing site.sh script too

Start building sites …
hugo v0.101.0+extended darwin/arm64 BuildDate=unknown
INFO 2022/07/26 19:15:23 syncing static files to /
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for layout "search" for kind "page": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for kind "section": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for kind "section": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for kind "taxonomy": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for kind "taxonomy": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
ERROR 2022/07/26 19:15:23 render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
ERROR 2022/07/26 19:15:23 render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
ERROR 2022/07/26 19:15:23 render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
ERROR 2022/07/26 19:15:23 render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
Error: Error building site: failed to render pages: render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
Total in 825 ms
error Command failed with exit code 255.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 255.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Bowrna avatar Jul 26 '22 13:07 Bowrna

Error: Error building site: failed to render pages: render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found

Hmmmm I may be wrong here but if I remember correctly this is one of the errors that occurs when the submodules aren't cloned.

Have you run git submodules update --init?

rossturk avatar Jul 26 '22 15:07 rossturk

Have you run git submodules update --init?

Yes after running this the build-site command works. Thanks, @rossturk. I have error occurring in check-site-links command and it fails with following error message.

sathishkannan@Sathishs-MacBook-Air airflow-site % python site_tool.py check-site-links
Checking if node module exists
Check if landing-pages/dist/index.html file exists
/opt/homebrew/bin/lynx
Working directory: /Users/sathishkannan/code/airflow-site/landing-pages
Dist directory: ./dist/

./check-links.sh: line 61: readarray: command not found
sathishkannan@Sathishs-MacBook-Air airflow-site % ./site.sh check-site-links
2022-07-27 11:32:33:INFO: Checking if node module exists
2022-07-27 11:32:33:INFO: Check if /Users/sathishkannan/code/airflow-site/landing-pages/dist/index.html file exists
2022-07-27 11:32:33:INFO: Changing to /Users/sathishkannan/code/airflow-site/landing-pages/ and running: ./check-links.sh@
/opt/homebrew/bin/lynx
Working directory: /Users/sathishkannan/code/airflow-site/landing-pages
Dist directory: ./dist/

./check-links.sh: line 61: readarray: command not found
sathishkannan@Sathishs-MacBook-Air airflow-site %

Am I missing downloading anything here? I checked in CONTRIBUTORS.md guide but didn't find anything relevant to this. I am working on Mac M1

Bowrna avatar Jul 27 '22 06:07 Bowrna

./check-links.sh: line 61: readarray: command not found

@Bowrna I think this has to do with the version of bash. If I try this on my M1 Mac, I get an error:

bash-3.2$ bash -version
bash -version
GNU bash, version 3.2.57(1)-release (arm64-apple-darwin21)
Copyright (C) 2007 Free Software Foundation, Inc.

bash-3.2$ readarray foo < <(ls -al)
readarray foo < <(ls -al)
ls -al
bash: readarray: command not found

But if I try it on my Linux machine, I get this:

rturk@caymus:~$ bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

rturk@caymus:~$ readarray foo < <(ls -al)

rturk@caymus:~$ echo ${foo[8]}
drwxr-xr-x 2 rturk caymus 4096 Oct 6 2021 .dbt

I think perhaps CONTRIBUTING.md should instruct folks to install bash from Homebrew. I haven't tested it myself, but this looks promising:

% brew info bash
bash: stable 5.1.16 (bottled), HEAD
...

rossturk avatar Jul 27 '22 06:07 rossturk

I think it has never been supposed to be run on Mac :).

And one of the reasons we want to rewrite everything in Python, is that bash is no longer "common" accross multiple platform. The MacOS Bash is Bash 3 (for licencing reason - Apple refused to use Bash4 due to GPLv3 (https://news.ycombinator.com/item?id=20102597) and a lot of tools there are not really POSIX compliant, so there are plenty of hacks around that (readarray is one of the features of Bash4).

The ultiimate goal we have is that Breeze and all our tools can be used WITHOUT having or using bash in the Host. Fully, Once we get there, the next step will be to make sure that it works on Windows.

And that should be our goal here as well (even though site.sh is not really used by many people). So - let's just not mention in CONTRIBUTING.md that you need to install bash from brew. I deliberately avoided that necessity, because that it yet another prerequisite that adds friction to the first-time experience.

Ideally just Python3 + Docker + Docker-Compose should be enough to do everything developer needs to do in Airlfow.

potiuk avatar Jul 27 '22 13:07 potiuk

I think perhaps CONTRIBUTING.md should instruct folks to install bash from Homebrew. I haven't tested it myself, but this looks promising:

This has profound consequences. Many tools which "assume" bash is v3 on Mac will stop working if you make Bash 4 default. Even if you try to install it via brew it will tell you that it is cask-only, which means that it is not becoming a default bash - you need to perform an extra step to make it default precisely because it might break a lot of things on Mac.

Going bash-less is the only good route we can take - now when we have Python3.7+ everywhere (previously various options of having Python2 + Python 3 (3.5 as well) created similar set of problems, but since everyone is now on Python 3.7+ (3.6 EOL reached) basing "lowest common denominator" on Python 3.7 is great for such an environment (that's why we have the whole "Rewrite Breeze to Python" project now and not 2 years ago..

potiuk avatar Jul 27 '22 13:07 potiuk

I think perhaps CONTRIBUTING.md should instruct folks to install bash from Homebrew. I haven't tested it myself, but this looks promising:

This has profound consequences. Many tools which "assume" bash is v3 on Mac will stop working if you make Bash 4 default. Even if you try to install it via brew it will tell you that it is cask-only, which means that it is not becoming a default bash - you need to perform an extra step to make it default precisely because it might break a lot of things on Mac.

I'm sorry, what are you suggesting as an alternative?

rossturk avatar Jul 27 '22 14:07 rossturk

I'm sorry, what are you suggesting as an alternative?

Remove Bash. Completely.

I think this whole task is about removing bash and replacing it with Python :). I think the problem @Bowrna has (correct me @Bowrna if I am wrong) that you cannot really run the OLD script corrrectly. This can be fixed locally by updating bash with brew and setting temporary PATH to get it first on path.

Once the script is replaced with Bash, there is no need to mention it in CONTRIBUTING - because ..... there will be no bash to run any more.

Or am I missing something :)?

potiuk avatar Jul 27 '22 14:07 potiuk

I'm sorry, what are you suggesting as an alternative?

Remove Bash. Completely. I think this whole task is about removing bash and replacing it with Python :). I think the problem @Bowrna has (correct me @Bowrna if I am wrong) that you cannot really run the OLD script corrrectly. This can be fixed locally by updating bash with brew and setting temporary PATH to get it first on path. Once the script is replaced with Bash, there is no need to mention it in CONTRIBUTING - because ..... there will be no bash to run any more.

Or am I missing something :)?

Yes, you are missing that this PR title is "replace site.sh" and the script we are talking about is landing-pages/check-links.sh. Are you suggesting that the scope of this PR is expanded to include all shell scripts?

rossturk avatar Jul 27 '22 14:07 rossturk

Yes, you are missing that this PR title is "replace site.sh" and the script we are talking about is landing-pages/check-links.sh. Are you suggesting that the scope of this PR is expanded to include all shell scripts?

Ahhhh. I see now :facepalm: .

Precisely. This was the whole purpose of the task - to replace all the bash usage. It makes no sense only to replace site.sh. The reason why we are doing it is not because "we do not like bash", but because "we want to get rid of bash from the host entirely" - to be able to run it all on MacOS, Linux, Windows or anything else that has "native" Python (and potentially has no bash).

So the TITLE of the task is wrong. Should be "Remove Bash from the process of building the site".

potiuk avatar Jul 27 '22 14:07 potiuk

And the root cause of the confusion was that when the original issue was created, there was indeed only one site.sh :

See the description of the issue by @mik-laj - so that's why the title was wrong

https://github.com/apache/airflow-site/issues/338

commented on 9 Dec 2020

In this project, we have one bash script - ./site.sh that starts to grow more and more, so its maintenance becomes more problematic. I think it's a good idea to rewrite it into Python.

potiuk avatar Jul 27 '22 14:07 potiuk

Ah, I see the whole point now. Currently to see how this check-site-links command, I have to install bash via brew and see what is happening under the hood. I will try converting that part to python after understanding what is happening in that part of code.

Bowrna avatar Jul 27 '22 16:07 Bowrna

Trying to understand the below bash command. Could anyone tell me what's happening in this command https://github.com/apache/airflow-site/blob/445586c1552918e9531cd13d9cd64c0e63f5da03/landing-pages/check-links.sh#L59-L63 When trying to execute the above script I just enabled print and saw it's logging a few html files. Screenshot 2022-07-28 at 11 48 17 PM

But when I took out that command and executed it via terminal it shows a big list of HTML files. I don't understand how this is different from executing in a script. When executing in terminal I see index.html from all providers and its different version say example /dist//docs/apache-airflow-providers-imap/1.0.0/_modules/index.html. But i don't see them when executing it via check-links.sh Screenshot 2022-07-29 at 12 04 18 AM

Bowrna avatar Jul 28 '22 18:07 Bowrna

It fills the array 'pages' with the list of HTML files found. The problem you see is likely the -print 0 flag- rather than producing files separated by "spaces" it separates them with \0 (null character) so that there is no confusion between space being separator between files and one that is part of the file name.

so if you have

  • a file.html
  • another file.html

This wil produce

"a file.html\0another file.html"

Then when you read it with -d '' you will get array:

['a file.html', 'another file.html']

Otherwise you'd get:

['a', 'file.html', 'another', 'file.html']

The problem with differences you see is that the "\0" separated files are not broken over multiple lines because they are really a very long line with no spaces to allow the break happen.

potiuk avatar Jul 28 '22 18:07 potiuk

When executing in terminal I see index.html from all providers and its different version say example /dist//docs/apache-airflow-providers-imap/1.0.0/_modules/index.html. But i don't see them when executing it via check-links.sh

Could you share your view for this point? This is also another difference in the output when executing it in different ways @potiuk .

Bowrna avatar Jul 29 '22 11:07 Bowrna

When executing in terminal I see index.html from all providers and its different version say example /dist//docs/apache-airflow-providers-imap/1.0.0/_modules/index.html. But i don't see them when executing it via check-links.sh

Could you share your view for this point? This is also another difference in the output when executing it in different ways @potiuk .

I think it's just the matter of when it is called and whether the files are there when it is executed. Some earlier steps likely prepare the files in dist folder and maybe that step is missing when you run it manually ? Can't think of any other reason.

potiuk avatar Jul 29 '22 11:07 potiuk

I have tried executing both of ways consecutively without running any other step in between. Let me debug this again and come with more information @potiuk

Bowrna avatar Jul 29 '22 12:07 Bowrna

Hey @Bowrna are you going to continue with that one :) ? Not the highest prirority, so if we do not plan it, maybe better to close that one?

potiuk avatar Nov 08 '22 11:11 potiuk

I will work on this part this weekend @potiuk. As I got stuck on resolving the issue, I started fixing other issues and slowly I became dormant in working on this issue.

Bowrna avatar Nov 10 '22 03:11 Bowrna

https://github.com/apache/airflow-site/blob/fd7af05318e34f4efd6b8446ba8729112f30ee08/site.sh#L199

Is this redirect still valid even after removing the docker part @mik-laj @potiuk @turbaszek @rossturk Could anyone help me to understand?

It is common to delete part of a URL to go to the parent location. For example, when a user visits a URL:

https://airflow.apache.org/docs/apache-airflow-providers-alibaba/2.1.0/index.html

then the user can delete part of the URL and go to:

https://airflow.apache.org/docs/apache-airflow-providers-alibaba/

When we do not have this redirect, the user will see a 404 error.

In other words, all directories the user has access to should have an index.html file to improve UX.

mik-laj avatar Nov 11 '22 13:11 mik-laj

Approved to see.

potiuk avatar Nov 25 '22 22:11 potiuk

@Bowrna are you still working on this PR?

eladkal avatar Dec 20 '22 13:12 eladkal

@eladkal I didn't work on this for a long time as I was working on other PR. I will start working on this PR and one another in this weekend as I have some time due to the upcoming holiday week.

Bowrna avatar Dec 21 '22 13:12 Bowrna