airflow icon indicating copy to clipboard operation
airflow copied to clipboard

breeze test parallel

Open Bowrna opened this issue 3 years ago • 11 comments

relates to: #23538 closes: #23538


^ Add meaningful description above

Read the Pull Request Guidelines for more information. In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

Bowrna avatar May 15 '22 09:05 Bowrna

I think it needs quite a bit more changes :).

More details in https://github.com/apache/airflow/issues/23538#issuecomment-1130737710

potiuk avatar May 18 '22 23:05 potiuk

Hey @Bowrna the #24236 is merged - feel free to rebase and work on "parallelising" :)

potiuk avatar Jun 14 '22 20:06 potiuk

@Bowrna there are conflicts to resolve

eladkal avatar Jul 20 '22 12:07 eladkal

@Bowrna there are conflicts to resolve

I will resolve this conflicts @eladkal

Bowrna avatar Jul 20 '22 13:07 Bowrna

@potiuk I have one doubt in the following code and it would be helpful if you can clarify it to me https://github.com/apache/airflow/blob/29c33165a06b7a6233af3657ace4f2bdb4ec27e4/dev/breeze/src/airflow_breeze/utils/common_options.py#L388-L395 Here why do you set the default to 4 if the env output_file_for_recording is available? If I am right in thinking, that env points to the output_command.svg file (print help in command as svg using rich) How does it influence the number of cpu count to use?

Bowrna avatar Aug 25 '22 18:08 Bowrna

Here why do you set the default to 4 if the env output_file_for_recording is available? If I am right in thinking, that env points to the output_command.svg file (print help in command as svg using rich) How does it influence the number of cpu count to use?

:smile: Very good question :)

The reason is very simple. The recording case is there to produce breeze documentation images, and those images are generated when set of commands or arguments change. This means that we have to make sure that no matter on what machine we run it - when recording images are prepared we always get THE SAME set o commands and flags.

Today - when you run this command on 4 CPU machine, the command will have 8 as upper range, when you run it on 8 CPU machine, the upper range is 16 and default is 4 and 8 respectively. So the set of arguments is different, depending on which machine you run it - and generated "documentation" will slightly differ and we need to make sure that no matter if you or me generate the docs, it will be the same. That's why when we are in "Recording" mode, some variable parts of commands and args are fixed to be independent of the environment.

potiuk avatar Aug 25 '22 18:08 potiuk

Hey @Bowrna - see https://github.com/apache/airflow/actions/runs/2983790915 :

When you have a bug in the workflow - you will see exclamation mark and detailed explanation :

Invalid workflow file: .github/workflows/ci.yml#L1007 The workflow is not valid. .github/workflows/ci.yml (Line: 1007, Col: 9): 'run' is already defined .github/workflows/ci.yml (Line: 1008, Col: 9): 'env' is already defined

potiuk avatar Sep 03 '22 10:09 potiuk

@potiuk I am facing issues in CI workflow in breeze parallel tests https://github.com/apache/airflow/runs/8204814169?check_suite_focus=true

But I am not sure why it's failing and can't figure out much from the error logs. Can you help me how I could understand this part better?

I tried running it on my machine and it failed in different steps with error below

Bind for 0.0.0.0:28080 failed: port is already allocated

Bowrna avatar Sep 06 '22 15:09 Bowrna

@potiuk I am facing issues in CI workflow in breeze parallel tests https://github.com/apache/airflow/runs/8204814169?check_suite_focus=true

But I am not sure why it's failing and can't figure out much from the error logs. Can you help me how I could understand this part better?

I tried running it on my machine and it failed in different steps with error below

Bind for 0.0.0.0:28080 failed: port is already allocated

@potiuk when you get time, please give some directions in resolving this issue. Thanks

Bowrna avatar Sep 10 '22 03:09 Bowrna

@potiuk when you get time, please give some directions in resolving this issue. Thanks

The thing here is that you should override (like it is in the original code) the port number for each "tests" command. Each of the tests run a command which is run here needs a different port number when run in paralllel and in the original code I am dynamically generating the port number so that they are different. This is a bit "arcane" so if you feel like this might be too difficult, I can take over that one?

potiuk avatar Sep 18 '22 20:09 potiuk

@potiuk when you get time, please give some directions in resolving this issue. Thanks

The thing here is that you should override (like it is in the original code) the port number for each "tests" command. Each of the tests run a command which is run here needs a different port number when run in paralllel and in the original code I am dynamically generating the port number so that they are different. This is a bit "arcane" so if you feel like this might be too difficult, I can take over that one?

Yes @potiuk I think you can take over that part and I will watch how you are doing the dynamic allocation of port number. So that I can take over something like that next time and do it

Bowrna avatar Sep 19 '22 10:09 Bowrna

I think https://github.com/apache/airflow/pull/26612 will likely get it obsolete :). I've done much more there - like debugging and tracking what's going on + some optimisations. It's close to merge, just testing last case. But it turned out to be quite a bit more complex eventually

potiuk avatar Sep 28 '22 10:09 potiuk

@potiuk Oh, I see! I didn't notice the changes you are making in that PR. Let me check it now.

Bowrna avatar Sep 28 '22 10:09 Bowrna

@Bowrna do we still need this PR?

eladkal avatar Nov 11 '22 15:11 eladkal

Ah no. It's implemented already :)

potiuk avatar Nov 11 '22 15:11 potiuk