scylla-machine-image icon indicating copy to clipboard operation
scylla-machine-image copied to clipboard

Ensure that Users Have Sensible Umask Values

Open syuu1228 opened this issue 6 months ago • 1 comments

We should configure default umask to more strict value.

This will apply following CIS compliance rules:

  • xccdf_org.ssgproject.content_rule_accounts_umask_etc_bashrc
  • xccdf_org.ssgproject.content_rule_accounts_umask_etc_login_defs
  • xccdf_org.ssgproject.content_rule_accounts_umask_etc_profile

Fixes #704

syuu1228 avatar May 14 '25 12:05 syuu1228

The patch looks fine, but it causes strange error on artifacts-ami-test (see below, it says apt failed to locate scylla-manager-agent package), so make this PR as draft until the error resolved.

05:50:29  Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get -o DPkg::Lock::Timeout=120 -o Dpkg::Options::="--force-confold" -o Dpkg::Options::="--force-confdef" install -y scylla-manager-agent'
05:50:29  Exit code: 100
05:50:29  Stdout:
05:50:29  Reading package lists...
05:50:29  Building dependency tree...
05:50:29  Reading state information...
05:50:29  Stderr:
05:50:29  E: Unable to locate package scylla-manager-agent
05:50:29  ----- LAST NORMAL EVENT ------------------------------------------------------
05:50:29  2025-05-12 20:50:14.778: (InfoEvent Severity.NORMAL) period_type=not-set event_id=b6d02665-1421-499c-8295-a067a3f18765: message=TEST_END
05:50:29  =======================================================

https://jenkins.scylladb.com/job/releng-testing/job/artifacts/job/artifacts-ami-test/70/consoleFull#-1671431256fcc21424-66d2-4bd8-8e0d-9746405e5b16

syuu1228 avatar May 14 '25 12:05 syuu1228

The patch looks fine, but it causes strange error on artifacts-ami-test (see below, it says apt failed to locate scylla-manager-agent package), so make this PR as draft until the error resolved.

05:50:29  Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get -o DPkg::Lock::Timeout=120 -o Dpkg::Options::="--force-confold" -o Dpkg::Options::="--force-confdef" install -y scylla-manager-agent'
05:50:29  Exit code: 100
05:50:29  Stdout:
05:50:29  Reading package lists...
05:50:29  Building dependency tree...
05:50:29  Reading state information...
05:50:29  Stderr:
05:50:29  E: Unable to locate package scylla-manager-agent
05:50:29  ----- LAST NORMAL EVENT ------------------------------------------------------
05:50:29  2025-05-12 20:50:14.778: (InfoEvent Severity.NORMAL) period_type=not-set event_id=b6d02665-1421-499c-8295-a067a3f18765: message=TEST_END
05:50:29  =======================================================

https://jenkins.scylladb.com/job/releng-testing/job/artifacts/job/artifacts-ami-test/70/consoleFull#-1671431256fcc21424-66d2-4bd8-8e0d-9746405e5b16

I guess this is not really related with this patch so I opened the issue on SCT: https://github.com/scylladb/scylla-cluster-tests/issues/11257

syuu1228 avatar Jun 24 '25 15:06 syuu1228

Changed the status as ready for review, since the test error seems not related with this patch. But let's wait for the error get fixed.

syuu1228 avatar Jun 24 '25 15:06 syuu1228

Rebased with master

syuu1228 avatar Jun 24 '25 15:06 syuu1228

this shouldn't be merged, while it doesn't pass artifacts tests.

Right, now I understood the patch itself is breaking tests, we have to wait for merge until all related tests passed.

syuu1228 avatar Jun 25 '25 13:06 syuu1228

Rebased with latest master

syuu1228 avatar Aug 25 '25 06:08 syuu1228

Rebased with latest next

syuu1228 avatar Sep 16 '25 14:09 syuu1228

@gmizrahi Could you run scylla cloud test (https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual) on this PR, too?

syuu1228 avatar Sep 17 '25 07:09 syuu1228

I need an us-east-1 AMI + the release tag (is it still 2025.4.0~dev?)

gmizrahi avatar Sep 17 '25 07:09 gmizrahi

I need an us-east-1 AMI + the release tag (is it still 2025.4.0~dev?)

@gmizrahi Here's AMI ID and release tag: us-east-1-x86_64: ami-07adf9d0f18172b5b scylla-version: 2025.4.0~dev

https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/next-machine-image/524/

syuu1228 avatar Sep 17 '25 07:09 syuu1228

@syuu1228 - https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual/8363/

gmizrahi avatar Sep 17 '25 09:09 gmizrahi

@syuu1228 - https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual/8363/

from the looks of it, it's the same thing as we seen in SCT

https://github.com/scylladb/scylla-cluster-tests/pull/11260/commits/7b47962071b553b4a573df4360ff1ae9e002b4d3

I would guess the /etc/apt/keyrings has the wrong permission, cause of the umask default and it might affect more similar things.

fruch avatar Sep 17 '25 13:09 fruch

Tested on rebased code, next-machine-image passed now: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/next-machine-image/540/ but longevity failed: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/91/consoleFull#-339195974fcc21424-66d2-4bd8-8e0d-9746405e5b16 Actually the error message looks same as https://github.com/scylladb/scylla-cluster-tests/issues/11887, so probably not related with umask changes, I guess. Since https://github.com/scylladb/scylla-cluster-tests/issues/11887 is not 100% reproducible, I'm re-running the test.

syuu1228 avatar Sep 29 '25 07:09 syuu1228

from the looks of it, it's the same thing as we seen in SCT

scylladb/scylla-cluster-tests@7b47962

I would guess the /etc/apt/keyrings has the wrong permission, cause of the umask default and it might affect more similar things.

Right, I found that there are very similar code in siren which creates .gpg file on /etc/apt/keyrings without chmod: https://github.com/scylladb/siren/blob/master/cluster/server/scripts/download_repository.sh#L32 I will send a patch to fix it.

syuu1228 avatar Sep 29 '25 07:09 syuu1228

but longevity failed: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/91/consoleFull#-339195974fcc21424-66d2-4bd8-8e0d-9746405e5b16 Actually the error message looks same as scylladb/scylla-cluster-tests#11887, so probably not related with umask changes, I guess. Since scylladb/scylla-cluster-tests#11887 is not 100% reproducible, I'm re-running the test.

I initially thought the error message was same as scylladb/scylla-cluster-tests#11887 but I realized it's different. The error message contains following permission error:

20:24:39  E           Command: 'sudo -u scylla tar xvfz /tmp/keyspace1.standard1.tar.gz -C /var/lib/scylla/data/keyspace1/standard1-f8462f509d0211f09d5fcaf55ccce0c7/upload/'
20:24:39  E           Exit code: 2
20:24:39  E           Stdout:
20:24:39  E           Stderr:
20:24:39  E           tar (child): /tmp/keyspace1.standard1.tar.gz: Cannot open: Permission denied
20:24:39  E           tar (child): Error is not recoverable: exiting now
20:24:39  E           tar: Child returned status 2
20:24:39  E           tar: Error is not recoverable: exiting now

It is likely related with strict umask, so this patch also breaks longevity test.

syuu1228 avatar Sep 30 '25 06:09 syuu1228

I guess these codes are related with the error pointed out above,

maybe we should run chmod after upload the tar.gz file: https://github.com/scylladb/scylla-cluster-tests/blob/master/sdcm/utils/sstable/load_utils.py#L68 or fixup permission before extracting: https://github.com/scylladb/scylla-cluster-tests/blob/master/sdcm/utils/sstable/load_utils.py#L82

syuu1228 avatar Sep 30 '25 06:09 syuu1228

rebased

syuu1228 avatar Sep 30 '25 06:09 syuu1228

rebased

syuu1228 avatar Oct 01 '25 03:10 syuu1228

but longevity failed: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/91/consoleFull#-339195974fcc21424-66d2-4bd8-8e0d-9746405e5b16 Actually the error message looks same as scylladb/scylla-cluster-tests#11887, so probably not related with umask changes, I guess. Since scylladb/scylla-cluster-tests#11887 is not 100% reproducible, I'm re-running the test.

I initially thought the error message was same as scylladb/scylla-cluster-tests#11887 but I realized it's different. The error message contains following permission error:

20:24:39  E           Command: 'sudo -u scylla tar xvfz /tmp/keyspace1.standard1.tar.gz -C /var/lib/scylla/data/keyspace1/standard1-f8462f509d0211f09d5fcaf55ccce0c7/upload/'
20:24:39  E           Exit code: 2
20:24:39  E           Stdout:
20:24:39  E           Stderr:
20:24:39  E           tar (child): /tmp/keyspace1.standard1.tar.gz: Cannot open: Permission denied
20:24:39  E           tar (child): Error is not recoverable: exiting now
20:24:39  E           tar: Child returned status 2
20:24:39  E           tar: Error is not recoverable: exiting now

It is likely related with strict umask, so this patch also breaks longevity test.

Opened PR for this: https://github.com/scylladb/scylla-cluster-tests/pull/12121

syuu1228 avatar Oct 09 '25 08:10 syuu1228

@syuu1228 - https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual/8363/

from the looks of it, it's the same thing as we seen in SCT

scylladb/scylla-cluster-tests@7b47962

I would guess the /etc/apt/keyrings has the wrong permission, cause of the umask default and it might affect more similar things.

Opened issue for this: https://github.com/scylladb/siren/issues/14513

syuu1228 avatar Oct 10 '25 18:10 syuu1228

Opened PR for this: scylladb/scylla-cluster-tests#12121 The patch merged, I'm testing longevity again.

Now we need to wait for https://github.com/scylladb/siren/issues/14513

syuu1228 avatar Oct 15 '25 06:10 syuu1228

Opened PR for this: scylladb/scylla-cluster-tests#12121 The patch merged, I'm testing longevity again.

Now we need to wait for scylladb/siren#14513

@adambabik - FYI

gmizrahi avatar Oct 15 '25 06:10 gmizrahi

On https://github.com/scylladb/scylla-cluster-tests/issues/11257 thread, @fruch suggested us:

there are probably lots more of such things, maybe even in scylla-cloud and field-eng cloud. I would suggest running those AMIs before merging it, via few SCT longevity to flush more of those out.

also I would recommend giving scylla-cloud and field-eng an AMI with those changes, and let them try it out as well

We tested on both SCT and scylla-cloud and trying to fix strict umask related errors on both projects, what about field-eng? @yaronkaikov @roydahan who should we contact with?

And here's the latest AMI is available here, so people can use it for testing strict umask: us-east-1-x86_64: ami-07f4379f7d2b3fca8

syuu1228 avatar Oct 15 '25 06:10 syuu1228

rebased with latest next

syuu1228 avatar Oct 15 '25 06:10 syuu1228

Tested longevity again, but it still causes same error as https://github.com/scylladb/scylla-cluster-tests/issues/11887. Likely not related umask, same behavior as other PRs (longevity very often fails with this error, even on master branch).

I tested both previously built AMI and latest build AMI, and both failed with same error:

  • https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/104/
  • https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/101/

syuu1228 avatar Oct 16 '25 07:10 syuu1228

@syuu1228

both look like https://github.com/scylladb/scylladb/issues/24428, which can be ignored for the sake of testing this change.

fruch avatar Oct 16 '25 10:10 fruch

rebased.

syuu1228 avatar Oct 20 '25 07:10 syuu1228

@gmizrahi Could you run scylla cloud test again, to ensure permission denied error is gone? Here's AMI ID and the tag: us-east-1-x86_64: ami-07f4379f7d2b3fca8 2026.1.0~dev-0.20251003.20aeed160740

syuu1228 avatar Oct 21 '25 17:10 syuu1228

rebased

syuu1228 avatar Oct 22 '25 05:10 syuu1228

07f4379f7d2b3fca8

https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual/8619/

gmizrahi avatar Oct 22 '25 08:10 gmizrahi