Sync PeeringDB data to a local database due to API rate-limits
The issue
In the newest version of Pathvector I'm facing issues like the following:
FATA[0026] unable to get PeeringDB data: PeeringDB GET request expected 200, got 429 Too Many Requests
Environment
root@c01:/home/jkoenig# pathvector version
Pathvector 6.0.2
Built 05f7142b87ff03a3ff018d6693f0a77090167184 on 2022-07-30T20:17:59Z
Plugins: (none)
root@c01:/home/jkoenig# uname -a
Linux c01 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux
Expected behavior Would it be possible to implement something like this please: https://docs.peeringdb.com/blog/faster_queries/
Doing so would sync the PeeringDB data to a local copy which could be queried without any kind of rate-limit.
@netfreak98 The configuration variable peeringdb-cache is set to true by default, and it should help alleviate this. How often are you running pathvector generate?
Personally, I run it nightly and have no issues with hitting API limits, although I only have 14 peer configurations - I can see this being an issue with larger number of peers.
You could use authenticated requests to the API by means of utilizing an API key with peeringdb-api-key - see more details on PeeringDB API query rate limits here: https://docs.peeringdb.com/howto/work_within_peeringdbs_query_limits/
@netstx Has the exact right idea. The internal peeringdb cache isn't persistent between multiple process executions, so I'll leave this open to track implementing a persistent cache.
An even better idea would be querying a local version of the PeeringDB database which they allow you to do through tools such as https://github.com/peeringdb/peeringdb-py
Would be great to have this as I keep getting 429's all over the place from PeeringDB due to the large number of peers configured.
just bumping this one, as I am getting this fairly often as of late too with the numbers of peers i've got these days.
https://docs.peeringdb.com/howto/work_within_peeringdbs_query_limits/
has on there a Efficient queries section, which describes how they could potentially have up to 150 asn's per query, what would be the chances of getting pathvector to run through all peers that require then contacting with say.. 50 asns per query?
or, even having it have an auto-timeout/pause between queries to slow it down and keep it between peeringdb limitations, eg. 10 requests every minute. yes, it would make some larger configs quite slow, but would avoid the termination of the generate functions?
A figured a persistent PeeringDB cache should really be an external component rather than part of Pathvector, so I've gone ahead and implemented a drop-in API replacement for any service that relies on PeeringDB.
https://github.com/natesales/peeringdb-cache
As of Pathvector 6.3.2, the peeringdb-url option lets you point Pathvector at a local PeeringDB cache instance. This now makes Pathvector run really fast, even on routers with hundreds of peers.