API timeout for GET /api/v1/servers/localhost/zones in PDNS 4.9 - It even crashes OS
We migrated from PDNS (Authorative) v4.1.X to v4.9.
We noticed that the PDNS API is not working properly with /api/v1/servers/localhost/zones
it just times out.
Running the same curl command in the old version (4.1) works as expected taking under 1 minute. We ran other api calls just to be sure is not a connectivity issue and they worked as expected.
This makes PDA unusable, as it relays on GET localhost/zones, to sync its UI...
So to summarize:
- On PDNS v4.1
curl -H 'X-API-Key: <apikey>' http://127.0.0.1:8081/api/v1/servers/localhost/zones/domain1.com--> OKcurl -H 'X-API-Key: <apikey>' http://127.0.0.1:8081/api/v1/servers/localhost/zones--> OK (under 1 minute)
- On PDNS v4.9
curl -H 'X-API-Key: <apikey>' http://127.0.0.1:8081/api/v1/servers/localhost/zones/domain1.com--> OKcurl -H 'X-API-Key: <apikey>' http://127.0.0.1:8081/api/v1/servers/localhost/zones--> ERROR (Times out after 10 minutes, also sometimes crashes the whole VM)
We have around 300k domains managed by this PDNS server.
Conducted several more tests: Test 1: On the Rocky Linux 9 -->
- Downgraded from 4.9 down to 4.7 within rocky linux OS.
- Pointed to the migrated DB and a snapshot before the schema change. It didn't work in any case
Test 2: On the CentOS 7 -->
- Installed PDNS 4.9. Pointed to the migrated DB and a snapshot before the schema change. It didn't work in any case- - Downgraded PDNS from 4.9 down to 4.2. None minor version worked
- Installed PDNS 4.1.14 --> with this one worked (both pointing to the old/non-schema-migrated and new database)
Environment
- Installed through yum (https://repo.powerdns.com/repo-files/centos-auth-49.repo)
- OS: PDNS4.9 in Rocky Linux 9, PDNS 4.1 in Centos 7
- Backend used: Mysql 8 (RDS aurora.mysql specifically). We ran the old version where the /zones call works connecting to same DB after the DB schema was upgraded and it worked too
- HW Specs: 2vCPU, 4gb RAM (aws - t3.medium) | 2gb SWAP, same specs for both pdns servers (4.9 and 4.1)
Issue Type and Service
- Program: Authoritative
- Issue type: Bug report
Expected behaviour
for the PDNS api to call to http://127.0.0.1:8081/api/v1/servers/localhost/zones to respond under 5 minutes
Actual behaviour
The request timesout (sometimes even crashes the VM)
Other information
This worked perfectly on version 4.1
Some initial questions:
- How much memory do you have for pdns_server?
- Which backend is in use?
- Which OS?
Thanks for the followup, sorry for the missing info. Updating the post with that context
ERROR (Times out after 10 minutes, also sometimes crashes the whole VM)
Would you by any chance be able to check how much memory is used during that test? Do you know if the operating system reports an out-of-memory condition? Knowing how much the same operation uses with version 4.1 would be very useful as well.
We upscale the instance to a xlarge (4vcpu 16gb ram) assuming it was related to this. But the same behaviour showed up. Also I posted "10Minutes timeout" before, but is because we kill it before it can actually finishes.
We can check the OS yes. But not right away sadly
Found a solid lead
running in 4.9 with dnssec=false it works.
curl -H 'X-API-Key: <key>' --max-time 120 'http://127.0.0.1:8081/api/v1/servers/localhost/zones?dnssec=false'
This change, cascades into other issues within PDA so is not a fix for us.
Could this help in pinpointing the cause of it?