superset icon indicating copy to clipboard operation
superset copied to clipboard

chore(docker): move mysql os-level deps (GPL) to dev image only

Open mistercrunch opened this issue 1 year ago • 2 comments

Since mysql client libs are GPL-licensed, we don't want to package them in our default images. Prior to this PR, the default-libmysqlclient-dev apt-get package would get installed in the lean docker image, though the pypi mysqlclient package that depends on this os-level dep was only installed in the dev layer.

After this PR, all GPL/mysql related package won't be in lean, but still be in dev for convenience and CI script that test against mysql.

mistercrunch avatar Jul 29 '24 22:07 mistercrunch

Curious to see if --no-cache-dir will affect our PyPI stats and smooth things out at all.

Also, do you have any sense of what's causing the memory increase to the point of needing swap space? Probably unrelated to this, but a little disconcerting.

rusackas avatar Jul 30 '24 21:07 rusackas

Right, not sure what's up with it, it looks like the job is getting killed, presumably because it's abusing memory usage (?) The confusing thing is we're actually installing less things in the step that fails.

I tried a few things already:

  • setting up swap space, doesn't seem to help, result of following a GPT-generated idea, I'll probably remove that part of the PR
  • was looking for ways to reduce the parallelism in the docker build, currently it's building the superset-node (super memory heavy) while it builds the lean layer (pip install requirements/base.txt), but didn't find a clear/easy way to do it. Was looking for either a docker CLI flag or some env var, but nothing seemed to work, GPT hallucinated a few solutions but that didn't work

I'm wondering what in this PR is pushing the memory usage (assuming that's why the job gets killed) over the edge, and it seems to indicate that the master is probably fragile and on the edge too, meaning that this CI step is likely to fail on master soon too next time we touch something related.

Next steps/ideas:

  • maybe running a docker command to build the superset-node target first, and then push for the other command which would reuse the cached layers from the first command
  • find some other way to limit parallelism
  • find a way to get more memory in GHA?

mistercrunch avatar Jul 31 '24 20:07 mistercrunch