community.zabbix icon indicating copy to clipboard operation
community.zabbix copied to clipboard

Refactor zabbix server

Open eb4x opened this issue 1 year ago • 11 comments

SUMMARY

This PR consists of a lot of iterative commits, they should all be (somewhat) self-explanatory, doing a "single self-contained" change and a rationale for it. If one of the changes are not to anyones liking, comment on it and we can either improve upon it or take it out.

This is a considerably big refactor of the zabbix_server role. I'm not expecting it to break for existing installations and whatever overrides have been provided. It's not intended to "fix" any issues, just clean up the tasks and make the experience somewhat more consistent. (Though it does bring back support for deploying Zabbix 5.0 LTS.)

The most "controversial" change might be stripping out the zabbix_valid_server_versions check.

Other decisions include moving package installation out of the os_family tasks, tasked with setting up the repo. But they did much more beyond that. So the installation happens in main.yml, after repo setup, and the dependencies for db-variants are installed in the mysql/postgresql task files, (instead of checking which dependencies are needed for which db-variant for which os, during repo setup)

Arguably the repo-setup should be it's own role, as it's repeated out for every single role.

ISSUE TYPE
  • Refactor Pull Request
COMPONENT NAME

zabbix_server

ADDITIONAL INFORMATION

I'd like for this refactor to retain all its commits and messages, and not become a squash merge. One thing I sorely missed during this refactor was the ability to find a log as to why things were done in a certain way.

I couldn't quite get molecule to work by itself, and to iterate and test multiple platforms I modified it a bit to have things happen in parallel. I also got a dedicated machine with multiple gh-runners set-up, but can't really explain why wasn't a great success with the existing workflow.

Also, I couldn't get the geerlingguy images to work, but I think the thing we want are simple systemd based images (alma and rocky provide these already) with python3 installed, which turns out being very simple Dockerfiles to create.

FROM {{ item.base_image }}
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y python3-pip systemd-sysv
CMD ["/sbin/init"]

So maybe rely on upstream images instead of geerlingguy in the future?

You can check out these kinds of changes here; https://github.com/eb4x/community.zabbix/blob/molecule5/molecule/zabbix_server/molecule.yml

eb4x avatar Mar 18 '24 21:03 eb4x

So without having seen if molecule pushes through all of this (just hit the go button) I will say this. First, you did a ton of work here, so thank you for that. It is a big refactor. I will say some of the reason why tasks are broken out the way they are (in Debian vs main for example) most likely goes back to how things were put together < 2.0 when we were trying to accommodate pretty much every version of Zabbix known to man. We cleaned up a lot of it, but kept the same basic form-factor of task distribution. I'm not opposed to doing that, but if we are, I would only support it if it was a common refactor across all of the roles.

I don't personally support bringing back 5.0. One if we're going to do it, we need to do it for all of the roles but more importantly our current stance is we only support fully supported versions of Zabbix. Yes it limits some people's use cases but the goal of the collection is not to provide 100% support for every use case, but to support the most common.

In one of your PRs you made a comment about the user should know the risks of trying to run against an unsupported version. I agree that they should but when you look at some of the tickets that get created, its obvious that they don't which is why we installed the sanity check. My own personal thinking when I introduced that was if they understood the risks, then they would also be able to comment that task out easy enough to do whatever they were wanting to do.

I'm curious to hear @D3DeFi and @BGmot thoughts.

pyrodie18 avatar Mar 21 '24 00:03 pyrodie18

It is a huge effort made by you @eb4x , but I totally support @pyrodie18 with Zabbix 5.0 LTS, it is end of support since last year may (https://www.zabbix.com/life_cycle_and_release_policy). We can not keeping old versions like the beginning of the role when it was under my name, it takes a lot of effort to maintain them.

dj-wasabi avatar Mar 21 '24 07:03 dj-wasabi

Thank you for taking a look :smiley:

We cleaned up a lot of it, but kept the same basic form-factor of task distribution. I'm not opposed to doing that, but if we are, I would only support it if it was a common refactor across all of the roles.

I just wanted to get this PR out the door to see how it would be received. If this is something we would go forward with, I'll do a refactor on the rest of the roles, and this PR can just sit and wait.

I don't personally support bringing back 5.0.

There was so much code in place to bring it back already, it was like 8 lines missing. With the amount of code-reduction in this refactor it was a fun and easy thing to add.

we need to do it for all of the roles but ...

Can do :smile_cat: !

but when you look at some of the tickets that get created, its obvious that they don't which is why we installed the sanity check.

Fair point :+1:

they would also be able to comment that task out easy enough to do whatever they were wanting to do.

Perhaps supply a flag to a when condition of such a task, so we can bypass it. -e skip_version_check=true? I'll refactor in that change.

Zabbix 7 is coming soon. I've already done some work to the plugins to support the renamed mappings, but only for the ones I needed. That codebase could use a good refactor aswell, but I'd need some guidance and hear some opinions or preferences for the code.

eb4x avatar Mar 21 '24 08:03 eb4x

I just wanted to get this PR out the door to see how it would be received. If this is something we would go forward with, I'll do a refactor on the rest of the roles, and this PR can just sit and wait.

Let me download this PR and look at it without the comparisons (its really hard to see simplification with all of the red lines but I'm sure its there and give you a better opinion later on today or this weekend.

There was so much code in place to bring it back already, it was like 8 lines missing. With the amount of code-reduction in this refactor it was a fun and easy thing to add.

I agree but then it becomes us going down this road that doesn't end. If I remember correctly there we comments made about us dropping support for 4.0 even though it was dead dead by that point and I'm sure the same argument could be made about only 9 lines (no idea but its a guess).

Perhaps supply a flag to a when condition of such a task, so we can bypass it. -e skip_version_check=true? I'll refactor in that change.

Ya I could support that

pyrodie18 avatar Mar 21 '24 13:03 pyrodie18

One other comment I will make about bringing back 5.0 is it grows the testing matrix substantially to add another version. It already takes something like 2 hours to run through the full test matrix when we affect every role and plugin. Also we removed all of the 5.0 support from the modules which is definitely more than 9 lines.

pyrodie18 avatar Mar 21 '24 13:03 pyrodie18

One other comment I will make about bringing back 5.0 is it grows the testing matrix substantially to add another version. It already takes something like 2 hours to run through the full test matrix when we affect every role and plugin.

Well, atleast we can strip out 6.2 now right? But yes, no-one in their right mind should be deploying 5.0, so I'm fully behind not testing for it. Since the scripts had a check for whether to use create.sql or server.sql, I opted to complete that functionaliy instead of taking it away. There is very little else required, just some config-options. (How about we not tell anyone the roles can also deploy 5.0 again?)

Also we removed all of the 5.0 support from the modules which is definitely more than 9 lines.

Ah, didn't think about those.

eb4x avatar Mar 21 '24 14:03 eb4x

Ok, so I took a 32nd look at this code, and there is a flaw here. It was always there, but the ansible python database dependencies are installed on the zabbix_server even though the database creation tasks could be delegated to an external database, in which case the dependencies could be missing. This was a problem before the refactor, it remains a problem after.

I'll drum up a fix for that.

eb4x avatar Mar 21 '24 14:03 eb4x

Well, atleast we can strip out 6.2 now right?

Removing support for a already working feature/version is a breaking change. When we release 3.0 (which we plan on doing with the release of Zabbix 7) we'll drop support for 6.2 and 6.4 as well as a few operating systems.

pyrodie18 avatar Mar 21 '24 14:03 pyrodie18

During the refactoring of zabbix_proxy I noticed more variables (zabbix_server_install_database_client) that we could take advantage of. So I've brought those features over here by updating the commits that affect those. Also renaming some tasks to better reflect what action is being taken, and trying to make the roles look as similar as possible.

eb4x avatar Mar 25 '24 14:03 eb4x

Apologies for the hold up, been busy trying to get my own actions-runner working so I can get these results myself, as currently I'm currently just testing against a few select platforms and using images generated from Dockerfiles.

I'm currently experimenting with running a kind kubernetes cluster, and the actions-runner-controller, and had some marginal success getting it to run atleast :sweat_smile:

I'll get back to this once I have gotten repeatable tests.

eb4x avatar Apr 13 '24 07:04 eb4x

Apologies for the hold-up. To borrow a sentence from @fasterthanlime

GitHub actions-runners do not spark joy.

So there are the important changes since last;

  • using pymysql for all platforms (swapping out mysqldb on Debian systems, reasoning is explained in 02102f3afa37fc265159d6eff97cf7bd00b335fc)
  • bringing back lsb to identify raspbian (details in 5c6337a6cf12bf69a1e39da1d3c3cc2b951e8716)
  • dropping code that made zabbix 5.0 installations work (2fa5fc6ad38d019d9c7b30d4c8864b1c5e961860)

(I'll create a separate PR to address some issues I had with scaling actions-runner)

eb4x avatar May 02 '24 08:05 eb4x

Just adding a note here as I took a look at cleaning up the selinux.yml tasks.

I think the inclusion of selinux.yml needs to move into main.yml after the zabbix packages are installed for some of the sebooleans to exist. I'll create a Vagrantfile and do some testing on vm's as the molecule containers won't test the selinux bits.

eb4x avatar May 10 '24 22:05 eb4x

I think the inclusion of selinux.yml needs to move into main.yml after the zabbix packages are installed for some of the sebooleans to exist.

I was completely wrong about this, the booleans are present on systems that have no zabbix software.

eb4x avatar May 12 '24 08:05 eb4x