scylla-machine-image
scylla-machine-image copied to clipboard
Ensure that Users Have Sensible Umask Values
We should configure default umask to more strict value.
This will apply following CIS compliance rules:
- xccdf_org.ssgproject.content_rule_accounts_umask_etc_bashrc
- xccdf_org.ssgproject.content_rule_accounts_umask_etc_login_defs
- xccdf_org.ssgproject.content_rule_accounts_umask_etc_profile
Fixes #704
The patch looks fine, but it causes strange error on artifacts-ami-test (see below, it says apt failed to locate scylla-manager-agent package), so make this PR as draft until the error resolved.
05:50:29 Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get -o DPkg::Lock::Timeout=120 -o Dpkg::Options::="--force-confold" -o Dpkg::Options::="--force-confdef" install -y scylla-manager-agent'
05:50:29 Exit code: 100
05:50:29 Stdout:
05:50:29 Reading package lists...
05:50:29 Building dependency tree...
05:50:29 Reading state information...
05:50:29 Stderr:
05:50:29 E: Unable to locate package scylla-manager-agent
05:50:29 ----- LAST NORMAL EVENT ------------------------------------------------------
05:50:29 2025-05-12 20:50:14.778: (InfoEvent Severity.NORMAL) period_type=not-set event_id=b6d02665-1421-499c-8295-a067a3f18765: message=TEST_END
05:50:29 =======================================================
https://jenkins.scylladb.com/job/releng-testing/job/artifacts/job/artifacts-ami-test/70/consoleFull#-1671431256fcc21424-66d2-4bd8-8e0d-9746405e5b16
The patch looks fine, but it causes strange error on artifacts-ami-test (see below, it says apt failed to locate scylla-manager-agent package), so make this PR as draft until the error resolved.
05:50:29 Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get -o DPkg::Lock::Timeout=120 -o Dpkg::Options::="--force-confold" -o Dpkg::Options::="--force-confdef" install -y scylla-manager-agent' 05:50:29 Exit code: 100 05:50:29 Stdout: 05:50:29 Reading package lists... 05:50:29 Building dependency tree... 05:50:29 Reading state information... 05:50:29 Stderr: 05:50:29 E: Unable to locate package scylla-manager-agent 05:50:29 ----- LAST NORMAL EVENT ------------------------------------------------------ 05:50:29 2025-05-12 20:50:14.778: (InfoEvent Severity.NORMAL) period_type=not-set event_id=b6d02665-1421-499c-8295-a067a3f18765: message=TEST_END 05:50:29 =======================================================https://jenkins.scylladb.com/job/releng-testing/job/artifacts/job/artifacts-ami-test/70/consoleFull#-1671431256fcc21424-66d2-4bd8-8e0d-9746405e5b16
I guess this is not really related with this patch so I opened the issue on SCT: https://github.com/scylladb/scylla-cluster-tests/issues/11257
Changed the status as ready for review, since the test error seems not related with this patch. But let's wait for the error get fixed.
Rebased with master
this shouldn't be merged, while it doesn't pass artifacts tests.
Right, now I understood the patch itself is breaking tests, we have to wait for merge until all related tests passed.
Rebased with latest master
Rebased with latest next
@gmizrahi Could you run scylla cloud test (https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual) on this PR, too?
I need an us-east-1 AMI + the release tag (is it still 2025.4.0~dev?)
I need an us-east-1 AMI + the release tag (is it still 2025.4.0~dev?)
@gmizrahi Here's AMI ID and release tag: us-east-1-x86_64: ami-07adf9d0f18172b5b scylla-version: 2025.4.0~dev
https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/next-machine-image/524/
@syuu1228 - https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual/8363/
@syuu1228 - https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual/8363/
from the looks of it, it's the same thing as we seen in SCT
https://github.com/scylladb/scylla-cluster-tests/pull/11260/commits/7b47962071b553b4a573df4360ff1ae9e002b4d3
I would guess the /etc/apt/keyrings has the wrong permission, cause of the umask default
and it might affect more similar things.
Tested on rebased code, next-machine-image passed now: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/next-machine-image/540/ but longevity failed: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/91/consoleFull#-339195974fcc21424-66d2-4bd8-8e0d-9746405e5b16 Actually the error message looks same as https://github.com/scylladb/scylla-cluster-tests/issues/11887, so probably not related with umask changes, I guess. Since https://github.com/scylladb/scylla-cluster-tests/issues/11887 is not 100% reproducible, I'm re-running the test.
from the looks of it, it's the same thing as we seen in SCT
scylladb/scylla-cluster-tests@7b47962
I would guess the
/etc/apt/keyringshas the wrong permission, cause of the umask default and it might affect more similar things.
Right, I found that there are very similar code in siren which creates .gpg file on /etc/apt/keyrings without chmod: https://github.com/scylladb/siren/blob/master/cluster/server/scripts/download_repository.sh#L32 I will send a patch to fix it.
but longevity failed: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/91/consoleFull#-339195974fcc21424-66d2-4bd8-8e0d-9746405e5b16 Actually the error message looks same as scylladb/scylla-cluster-tests#11887, so probably not related with umask changes, I guess. Since scylladb/scylla-cluster-tests#11887 is not 100% reproducible, I'm re-running the test.
I initially thought the error message was same as scylladb/scylla-cluster-tests#11887 but I realized it's different. The error message contains following permission error:
20:24:39 E Command: 'sudo -u scylla tar xvfz /tmp/keyspace1.standard1.tar.gz -C /var/lib/scylla/data/keyspace1/standard1-f8462f509d0211f09d5fcaf55ccce0c7/upload/'
20:24:39 E Exit code: 2
20:24:39 E Stdout:
20:24:39 E Stderr:
20:24:39 E tar (child): /tmp/keyspace1.standard1.tar.gz: Cannot open: Permission denied
20:24:39 E tar (child): Error is not recoverable: exiting now
20:24:39 E tar: Child returned status 2
20:24:39 E tar: Error is not recoverable: exiting now
It is likely related with strict umask, so this patch also breaks longevity test.
I guess these codes are related with the error pointed out above,
maybe we should run chmod after upload the tar.gz file: https://github.com/scylladb/scylla-cluster-tests/blob/master/sdcm/utils/sstable/load_utils.py#L68 or fixup permission before extracting: https://github.com/scylladb/scylla-cluster-tests/blob/master/sdcm/utils/sstable/load_utils.py#L82
rebased
rebased
but longevity failed: https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/91/consoleFull#-339195974fcc21424-66d2-4bd8-8e0d-9746405e5b16 Actually the error message looks same as scylladb/scylla-cluster-tests#11887, so probably not related with umask changes, I guess. Since scylladb/scylla-cluster-tests#11887 is not 100% reproducible, I'm re-running the test.
I initially thought the error message was same as scylladb/scylla-cluster-tests#11887 but I realized it's different. The error message contains following permission error:
20:24:39 E Command: 'sudo -u scylla tar xvfz /tmp/keyspace1.standard1.tar.gz -C /var/lib/scylla/data/keyspace1/standard1-f8462f509d0211f09d5fcaf55ccce0c7/upload/' 20:24:39 E Exit code: 2 20:24:39 E Stdout: 20:24:39 E Stderr: 20:24:39 E tar (child): /tmp/keyspace1.standard1.tar.gz: Cannot open: Permission denied 20:24:39 E tar (child): Error is not recoverable: exiting now 20:24:39 E tar: Child returned status 2 20:24:39 E tar: Error is not recoverable: exiting nowIt is likely related with strict umask, so this patch also breaks longevity test.
Opened PR for this: https://github.com/scylladb/scylla-cluster-tests/pull/12121
@syuu1228 - https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual/8363/
from the looks of it, it's the same thing as we seen in SCT
scylladb/scylla-cluster-tests@7b47962
I would guess the
/etc/apt/keyringshas the wrong permission, cause of the umask default and it might affect more similar things.
Opened issue for this: https://github.com/scylladb/siren/issues/14513
Opened PR for this: scylladb/scylla-cluster-tests#12121 The patch merged, I'm testing longevity again.
Now we need to wait for https://github.com/scylladb/siren/issues/14513
Opened PR for this: scylladb/scylla-cluster-tests#12121 The patch merged, I'm testing longevity again.
Now we need to wait for scylladb/siren#14513
@adambabik - FYI
On https://github.com/scylladb/scylla-cluster-tests/issues/11257 thread, @fruch suggested us:
there are probably lots more of such things, maybe even in scylla-cloud and field-eng cloud. I would suggest running those AMIs before merging it, via few SCT longevity to flush more of those out.
also I would recommend giving scylla-cloud and field-eng an AMI with those changes, and let them try it out as well
We tested on both SCT and scylla-cloud and trying to fix strict umask related errors on both projects, what about field-eng? @yaronkaikov @roydahan who should we contact with?
And here's the latest AMI is available here, so people can use it for testing strict umask: us-east-1-x86_64: ami-07f4379f7d2b3fca8
rebased with latest next
Tested longevity again, but it still causes same error as https://github.com/scylladb/scylla-cluster-tests/issues/11887. Likely not related umask, same behavior as other PRs (longevity very often fails with this error, even on master branch).
I tested both previously built AMI and latest build AMI, and both failed with same error:
- https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/104/
- https://jenkins.scylladb.com/view/master/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/101/
@syuu1228
both look like https://github.com/scylladb/scylladb/issues/24428, which can be ignored for the sake of testing this change.
rebased.
@gmizrahi Could you run scylla cloud test again, to ensure permission denied error is gone? Here's AMI ID and the tag: us-east-1-x86_64: ami-07f4379f7d2b3fca8 2026.1.0~dev-0.20251003.20aeed160740
rebased
07f4379f7d2b3fca8
https://jenkins.scylladb.com/view/siren/job/siren-jobs/job/siren-backend-e2e-manual/8619/