pdns icon indicating copy to clipboard operation
pdns copied to clipboard

API timeout for GET /api/v1/servers/localhost/zones in PDNS 4.9 - It even crashes OS

Open julgonwimg opened this issue 1 year ago • 5 comments

We migrated from PDNS (Authorative) v4.1.X to v4.9.

We noticed that the PDNS API is not working properly with /api/v1/servers/localhost/zones it just times out.

Running the same curl command in the old version (4.1) works as expected taking under 1 minute. We ran other api calls just to be sure is not a connectivity issue and they worked as expected.

This makes PDA unusable, as it relays on GET localhost/zones, to sync its UI...

So to summarize:

  • On PDNS v4.1
    • curl -H 'X-API-Key: <apikey>' http://127.0.0.1:8081/api/v1/servers/localhost/zones/domain1.com --> OK
    • curl -H 'X-API-Key: <apikey>' http://127.0.0.1:8081/api/v1/servers/localhost/zones --> OK (under 1 minute)
  • On PDNS v4.9
    • curl -H 'X-API-Key: <apikey>' http://127.0.0.1:8081/api/v1/servers/localhost/zones/domain1.com --> OK
    • curl -H 'X-API-Key: <apikey>' http://127.0.0.1:8081/api/v1/servers/localhost/zones --> ERROR (Times out after 10 minutes, also sometimes crashes the whole VM)

We have around 300k domains managed by this PDNS server.

Conducted several more tests: Test 1: On the Rocky Linux 9 -->

  • Downgraded from 4.9 down to 4.7 within rocky linux OS.
  • Pointed to the migrated DB and a snapshot before the schema change. It didn't work in any case

Test 2: On the CentOS 7 -->

  • Installed PDNS 4.9. Pointed to the migrated DB and a snapshot before the schema change. It didn't work in any case- - Downgraded PDNS from 4.9 down to 4.2. None minor version worked
  • Installed PDNS 4.1.14 --> with this one worked (both pointing to the old/non-schema-migrated and new database)

Environment

  • Installed through yum (https://repo.powerdns.com/repo-files/centos-auth-49.repo)
  • OS: PDNS4.9 in Rocky Linux 9, PDNS 4.1 in Centos 7
  • Backend used: Mysql 8 (RDS aurora.mysql specifically). We ran the old version where the /zones call works connecting to same DB after the DB schema was upgraded and it worked too
  • HW Specs: 2vCPU, 4gb RAM (aws - t3.medium) | 2gb SWAP, same specs for both pdns servers (4.9 and 4.1)

Issue Type and Service

  • Program: Authoritative
  • Issue type: Bug report

Expected behaviour

for the PDNS api to call to http://127.0.0.1:8081/api/v1/servers/localhost/zones to respond under 5 minutes

Actual behaviour

The request timesout (sometimes even crashes the VM)

Other information

This worked perfectly on version 4.1

julgonwimg avatar Sep 27 '24 03:09 julgonwimg

Some initial questions:

  • How much memory do you have for pdns_server?
  • Which backend is in use?
  • Which OS?

zeha avatar Sep 27 '24 06:09 zeha

Thanks for the followup, sorry for the missing info. Updating the post with that context

julgonwimg avatar Sep 27 '24 11:09 julgonwimg

ERROR (Times out after 10 minutes, also sometimes crashes the whole VM)

Would you by any chance be able to check how much memory is used during that test? Do you know if the operating system reports an out-of-memory condition? Knowing how much the same operation uses with version 4.1 would be very useful as well.

rgacogne avatar Sep 27 '24 14:09 rgacogne

We upscale the instance to a xlarge (4vcpu 16gb ram) assuming it was related to this. But the same behaviour showed up. Also I posted "10Minutes timeout" before, but is because we kill it before it can actually finishes.

We can check the OS yes. But not right away sadly

julgonwimg avatar Sep 27 '24 14:09 julgonwimg

Found a solid lead running in 4.9 with dnssec=false it works. curl -H 'X-API-Key: <key>' --max-time 120 'http://127.0.0.1:8081/api/v1/servers/localhost/zones?dnssec=false'

This change, cascades into other issues within PDA so is not a fix for us.

Could this help in pinpointing the cause of it?

julgonwimg avatar Sep 27 '24 14:09 julgonwimg