photon icon indicating copy to clipboard operation
photon copied to clipboard

Updating tdnf to version 3.5.13-3.ph5 causes issue with SSH

Open Nifelhel opened this issue 1 month ago • 14 comments

Describe the bug

After we had installed photon-hw15-5.0-dde71ec57.x86_64.ova we ran tdnf update to get the OS updated, after that we had issues connecting using SSH or SFTP. When we try to connect you enter username and password, and the nothing happens for around 30 minutes when you finally get the standard prompt indicating that you have been logged in. This is an issue as most systems will timeout well before that time. After going through and installing each package one by one, we figured out that the issue happens after the update to tdnf 3.5.13-3.ph5 is installed. If we edit /etc/nsswitch.conf and remove dns from hosts we are able to login directly, but then it wont be able to lookup any host names unless we add them to /etc/hosts.

Reproduction steps

  1. Install photon-hw15-5.0-dde71ec57.x86_64.ova
  2. Run: tdnf update tdnf --assumeyes
  3. Reboot
  4. SSH to server ...

Expected behavior

No delay when after entering password to connect using SSH

Additional context

No response

Nifelhel avatar Nov 10 '25 06:11 Nifelhel

I will try to reproduce. In the mean time, can you please share logs? Add the -v option to ssh, and post the output of journalctl -u sshd.service while trying to login.

Does the host have internet access? Can you post the content of /etc/yum.repos.d/photon-updates.repo. I wonder if a login script tries to get the latest info from the repository, and times out.

Also - does this happen with older versions of tdnf? It would be good to know what change triggered this.

oliverkurth avatar Nov 10 '25 18:11 oliverkurth

I cannot reproduce this, even after trying multiple times. Please provide information as requested above.

oliverkurth avatar Nov 10 '25 19:11 oliverkurth

One thing to try - before trying to login, see if the command tdnf -q --refresh updateinfo works, or hangs.

oliverkurth avatar Nov 10 '25 20:11 oliverkurth

@oliverkurth "we ran tdnf update to get the OS updated" repomd.xml is missing on the 5.0 updates repository, see

https://packages-prod.broadcom.com/photon/5.0/photon_5.0_x86_64/repodata/ does have a repomd.xml file. https://packages-prod.broadcom.com/photon/5.0/photon_updates_5.0_x86_64/repodata/ does not have a repomd.xml file.

Can the team double check the repository?

dcasota avatar Nov 11 '25 00:11 dcasota

The file is there now, it may have been just fixed (timestamp is about 2 hours old right now). I do not think it was the root cause though because tdnf would error out immediately and not hang.

oliverkurth avatar Nov 11 '25 03:11 oliverkurth

Building a Dockerfile with 5.0:latest led to the missing repomd.xml trace. It‘s solved now!

In case of client-side dns settings with packages-prod.broadcom.com and packages.vmware.com, could this cause tdnf to enter an endless redirection loop until a timeout kicks in?

dcasota avatar Nov 11 '25 05:11 dcasota

Verified the issue again today by deploying a new photon VM (photon-hw15-5.0-dde71ec57.x86_64.ova) on vCenter. Network is DHCP so the only settings I need to do is to add a proxy server and install a root certificate as we do SSL interception on all internet traffic.

I then do the update:

root@photon-machine [ ~ ]# tdnf update tdnf

Upgrading:
tdnf-cli-libs                           x86_64                        3.5.13-3.ph5                            photon-updates                 71.97k                          45.16k
tdnf                                    x86_64                        3.5.13-3.ph5                            photon-updates                406.96k                         157.50k

Total installed size: 478.94k
Total download size: 202.66k
Is this ok [y/N]: y
tdnf-cli-libs                            46240 100%
tdnf                                    161285 100%
Testing transaction
Running transaction
Installing/Updating: tdnf-cli-libs-3.5.13-3.ph5.x86_64
Installing/Updating: tdnf-3.5.13-3.ph5.x86_64
detected upgrade of tdnf, daemon-reload
Removing: tdnf-3.5.2-3.ph5.x86_64
Removing: tdnf-cli-libs-3.5.2-3.ph5.x86_64

If I do "tdnf -q --refresh updateinfo" it works and it displays "20 Security notice(s)". If I run the command during the connection attempt it displays: waiting for tdnf_instance lock on /var/run/.tdnf-instance-lockfile 20 Security notice(s) WARNING: unable to remove lockfile(/var/run/.tdnf-instance-lockfile) If I try again after the connection has succeeded, it doesn't complain about the lockfile.

I tried ssh -v both before and after and the only difference is the last entry about last login. Here is the data:

OpenSSH_9.3p2, OpenSSL 3.0.18 30 Sep 2025
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to 10.64.159.197 [10.64.159.197] port 22.
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa_sk type -1
debug1: identity file /root/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: identity file /root/.ssh/id_ed25519_sk type -1
debug1: identity file /root/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /root/.ssh/id_xmss type -1
debug1: identity file /root/.ssh/id_xmss-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.3
debug1: Remote protocol version 2.0, remote software version OpenSSH_9.1
debug1: compat_banner: match: OpenSSH_9.1 pat OpenSSH* compat 0x04000000
debug1: Authenticating to 10.64.159.197:22 as 'root'
debug1: load_hostkeys: fopen /root/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: [email protected]
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: [email protected] MAC: <implicit> compression: none
debug1: kex: client->server cipher: [email protected] MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:8dSCh0AKJrjNpJ5ZJyu0fXotb1B3v9Dz0I6Tx0pvAjA
debug1: load_hostkeys: fopen /root/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: Host '10.64.159.197' is known and matches the ED25519 host key.
debug1: Found key in /root/.ssh/known_hosts:1
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: Will attempt key: /root/.ssh/id_rsa 
debug1: Will attempt key: /root/.ssh/id_ecdsa 
debug1: Will attempt key: /root/.ssh/id_ecdsa_sk 
debug1: Will attempt key: /root/.ssh/id_ed25519 
debug1: Will attempt key: /root/.ssh/id_ed25519_sk 
debug1: Will attempt key: /root/.ssh/id_xmss 
debug1: Will attempt key: /root/.ssh/id_dsa 
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,[email protected],ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,[email protected],[email protected]>
debug1: kex_input_ext_info: [email protected]=<0>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password,keyboard-interactive
debug1: Next authentication method: publickey
debug1: Trying private key: /root/.ssh/id_rsa
debug1: Trying private key: /root/.ssh/id_ecdsa
debug1: Trying private key: /root/.ssh/id_ecdsa_sk
debug1: Trying private key: /root/.ssh/id_ed25519
debug1: Trying private key: /root/.ssh/id_ed25519_sk
debug1: Trying private key: /root/.ssh/id_xmss
debug1: Trying private key: /root/.ssh/id_dsa
debug1: Next authentication method: keyboard-interactive
([email protected]) Password: 
Authenticated to 10.64.159.197 ([10.64.159.197]:22) using "keyboard-interactive".
debug1: channel 0: new session [client-session] (inactive timeout: 0)
debug1: Requesting [email protected]
debug1: Entering interactive session.
debug1: pledge: filesystem
debug1: client_input_global_request: rtype [email protected] want_reply 0
debug1: client_input_hostkeys: searching /root/.ssh/known_hosts for 10.64.159.197 / (none)
debug1: client_input_hostkeys: searching /root/.ssh/known_hosts2 for 10.64.159.197 / (none)
debug1: client_input_hostkeys: hostkeys file /root/.ssh/known_hosts2 does not exist
debug1: client_input_hostkeys: no new or deprecated keys from server
debug1: pledge: fork
Last login: Tue Nov 11 11:40:26 2025 from 10.64.169.194
 11:44:40 up 2 min,  0 user,  load average: 0.02, 0.02, 0.00

The delay happens after it displays "debug1: pledge: filesystem"

If I try "journalctl -u sshd.service" during the connection attempt it only displays "-- No entries --"

Will try and see if I can replicate this with an iso instead of the ova.

Nifelhel avatar Nov 11 '25 12:11 Nifelhel

Forgot to add the content of /etc/yum.repos.d/photon-updates.repo

[photon-updates] name=VMware Photon Linux $releasever ($basearch) Updates baseurl=https://packages.vmware.com/photon/$releasever/photon_updates_$releasever_$basearch gpgkey=file:///etc/pki/rpm-gpg/VMWARE-RPM-GPG-KEY file:///etc/pki/rpm-gpg/VMWARE-RPM-GPG-KEY-4096 gpgcheck=1 enabled=1 skip_if_unavailable=1

Nifelhel avatar Nov 11 '25 13:11 Nifelhel

We have the same issue on a Photon5 VM after last run of updates. Usually we do the ssh login with a non-root account. When motdgen tries to run "tdnf -q --refresh updateinfo" in user context, it hangs because a non-root user has no permission for some files and folders. During ssh Connection you don't see the "Permission denied" message. The login just hangs silently.

If i run the command on the console it gives me...

someuser@somevm [ ~ ]$ tdnf -q --refresh updateinfo 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/5cf42af5e53e22088379fae12292ade0ac73a32d0ac5169411dbb3c2da708646-updateinfo.xml.gz: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/ebd4fb485ab037660ad47044f751916838dcbbdad857e47ad6d8bfc32d2635d0-primary.xml.gz: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/repomd.xml: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/322c617b79d9090a6a347c6ee906791ed4be2d7722e2803fa808ded44a7d06c9-filelists.xml.gz: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/e74d98c2a1e45b72a5ab3b7dbeda85ec35dfa5814cb637824b1cea33b29dea43-other.xml.gz: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata: Permission denied unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/solvcache/photon-updates.solv: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/solvcache: Permission denied 
Error(1613) : Permission denied 
Error: Failed to synchronize cache for repo 'VMware Photon Linux 5.0 (x86_64) Updates' 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/5cf42af5e53e22088379fae12292ade0ac73a32d0ac5169411dbb3c2da708646-updateinfo.xml.gz: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/ebd4fb485ab037660ad47044f751916838dcbbdad857e47ad6d8bfc32d2635d0-primary.xml.gz: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/repomd.xml: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/322c617b79d9090a6a347c6ee906791ed4be2d7722e2803fa808ded44a7d06c9-filelists.xml.gz: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata/e74d98c2a1e45b72a5ab3b7dbeda85ec35dfa5814cb637824b1cea33b29dea43-other.xml.gz: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/repodata: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/solvcache/photon-updates.solv: Permission denied 
unable to remove /var/cache/tdnf/photon-updates-78f2dcdf/solvcache: Permission denied 
Error(1613) : Permission denied 

mame030 avatar Nov 11 '25 14:11 mame030

It looks like it's the tdnf updateinfo, run from the script /etc/motdgen.d/02-tdnf-updateinfo.sh that is run on every login.

To work around, and if you are not interested in the the update info on login, you can delete that script.

To further root cause why it hangs, please try, as root:

rm -rf /var/cach/tdnf
tdnf --refresh updateinfo

and see where it hangs. I suspect an issue with DNS.

I have created a PR that would time out the command after 15 seconds, and also fix a few other issues: https://github.com/vmware/tdnf/pull/549 .

oliverkurth avatar Nov 11 '25 21:11 oliverkurth

When i run the command as root User it works. I don't have to remove the cache folder before. It just works.

When i run the script as non-root User I get a "Permission denied" error, because the command tries to remove data from /var/cache/tdnf, which is not allowed for non-root Users. So, I don't think, that this is related to DNS.

I think i will have to remove the script as a workaround.

mame030 avatar Nov 13 '25 09:11 mame030

If I'm logged in as root, there is no issue running "tdnf --refresh updateinfo"

root@photon-machine [ ~ ]# rm -rf /var/cach/tdnf
root@photon-machine [ ~ ]# tdnf --refresh updateinfo
Refreshing metadata for: 'VMware Photon Linux 5.0 (x86_64) Updates'
photon-updates                            3570 100%
photon-updates                          686106 100%
photon-updates                         2732779 100%
photon-updates                            4023 100%
photon-updates                          289310 100%
20 Security notice(s)
root@photon-machine [ ~ ]#

Removing the script "/etc/motdgen.d/02-tdnf-updateinfo.sh" resolved the delay issue, so the addition of a timeout should minimize the issue.

Still wondering why it works when I'm logged in, but not during the login process. I see that the file /var/run/.tdnf-instance-lockfile is created during the login, I'm wondering if that is created before the script is run and what it is waiting for is for the lock file to be removed.

Nifelhel avatar Nov 13 '25 10:11 Nifelhel

You have a typo here, should be /var/cache/tdnf. You see nor error there because you use -f:

root@photon-machine [ ~ ]# rm -rf /var/cach/tdnf
root@photon-machine [ ~ ]# tdnf --refresh updateinfo
Refreshing metadata for: 'VMware Photon Linux 5.0 (x86_64) Updates'
photon-updates                            3570 100%
photon-updates                          686106 100%
photon-updates                         2732779 100%
photon-updates                            4023 100%
photon-updates                          289310 100%
20 Security notice(s)
root@photon-machine [ ~ ]#

However, it does fetch metadata anyway. Just in case, can you try again by correctly deleting /var/cache/tdnf?

Removing the script "/etc/motdgen.d/02-tdnf-updateinfo.sh" resolved the delay issue, so the addition of a timeout should minimize the issue.

Thanks for confirmation.

Still wondering why it works when I'm logged in, but not during the login process. I see that the file /var/run/.tdnf-instance-lockfile is created during the login, I'm wondering if that is created before the script is run and what it is waiting for is for the lock file to be removed.

tdnf creates it. You can check with ps awx | grep tdnf to see if it's running during the login.

oliverkurth avatar Nov 13 '25 16:11 oliverkurth

Sorry about the typo, but it didn't change anything as you said, still works fine when I'm logged in as root.

I know that tdnf creates the lock file, but the delay of script feels similar to if the lock file has been created before the script is run. If I create the lock file manually and then run "tdnf --refresh updateinfo" it will show a message that it is waiting for the lock file to be removed, and the wait time until it continues is similar to one you get one logging in.

No matter, we don't know root cause of the issue but we have a good workaround, Thanks

Nifelhel avatar Nov 14 '25 09:11 Nifelhel