terraform-aws-bastion
terraform-aws-bastion copied to clipboard
Failed to connect via ssh due to script not able to open log file
Hello, I have just recently begun seeing this issue below when connecting to my instances using this module:
NOTE: This SSH session will be recorded
AUDIT KEY: 2022-02-15_18-27-27_user
script: cannot open /var/log/bastion/2022-02-15_18-27-27_user_e6lpux4Ey0cFZp8bRI59KTXqutE8H6PL.data: Permission denied
Connection to monitoring-bastion-test.company.com closed.
There haven't been any changes made on these machines since their inception outside of installing some local tools like kubectl. Has anybody seen this issue before? The SSH session is also interactive only so I can't issue any commands over SSH to try and get around this.
I'm also experiencing the same problem with version 2.2.2. We cannot identify what changed so that we are getting the error now and not before.
If we login with the ubuntu
user which has privileges to write to /var/log/bastion
everything works.
I wonder if this has to do with the setfacl -Rdm other:0 /var/log/bastion
command in the initialization script. But also there we didn't change anything.
Here the error message:
$ ssh [email protected]
NOTE: This SSH session will be recorded
AUDIT KEY: 2022-02-16_14-11-17_atavio
uid=1002(atavio) gid=1002(atavio) groups=1002(atavio)
script: cannot open /var/log/bastion/2022-02-16_14-11-17_atavio_5EO73WSrarUoZU21TwwC8IQikdTmsriF.data: Permission denied
Connection to jump-server.example.com closed.
And here the contents and ACLs of /var/log/bastion
:
ubuntu@ip-xx-x-x-xxx:/var/log/bastion$ ls -la
total 18604
drwxrwx---+ 2 ubuntu ubuntu 4096 Feb 16 13:23 .
drwxrwxr-x 12 root syslog 4096 Feb 16 00:00 ..
-rw-rw---- 1 ubuntu ubuntu 59444 Feb 16 14:09 2022-02-16_13-23-17_ubuntu_N53Y5ptmUdOeDsW4bUsQvhjGLC.data
-rw-rw---- 1 ubuntu ubuntu 9539 Feb 16 14:09 2022-02-16_13-23-17_ubuntu_N53Y5ptmUdOeDsW4bUsQvhjGLC.time
ubuntu@ip-xx-x-x-xxx:/var/log/bastion$ getfacl /var/log/bastion
getfacl: Removing leading '/' from absolute path names
# file: var/log/bastion
# owner: ubuntu
# group: ubuntu
user::rwx
group::rwx
other::---
default:user::rwx
default:group::rwx
default:other::---
We are having the same issue. One day things were working and the next day it was not. We didn't change anything. I suspect it has something to do with setfacl -Rdm other:0 /var/log/bastion
but again we didn't change or update that in any way.
Is there a fix or solution for this, or a known cause at least?
Would be great to find an answer here. I've had my bastion host running for around 6+ months without issue and suddenly ran into this without any reason as to why.
edit: as a bandaid, I just terminated the ec2 instance. When the ASG rolled out a new one, all is working again.
Also just ran into the issue with several of my bastions.
same here. Going to update to version 3.0.2 and see if that resolves my issue.
I also had to terminate the ec2 instance. Once it recreated I could connect again.
This is now happening about every 24 hours - I've automated the re-provisioning of bastion hosts as a result. Would be great to see a fix.
I think I found it.
Bastion installs once a day at midnight security update. If new version of script
(package util-linux) is installed then setuid bit on it goes away. script
then runs as logged user and does not have access to log directory /var/log/bastion
.
Note: This also clarifies the behavior when different bastions irregularly stop working at the same time. Last update of util-linux was yesterday.
Pull request with fix follows.