python-musicbrainzngs icon indicating copy to clipboard operation
python-musicbrainzngs copied to clipboard

rate limiting limits requests to the rate at which MB can return a full response, not 1 second

Open zzzeek opened this issue 7 years ago • 4 comments

Per musicbrainz, the rate limit is one request per second:

https://musicbrainz.org/doc/XML_Web_Service/Rate_Limiting "The rate at which your IP address is making requests is measured. If that rate is too high, all your requests will be declined (http 503) until the rate drops again. Currently that rate is (on average) 1 request per second. "

However, it appears that the _rate_limit decorator is wrapping the _mb_request() method in its entirety, and is including the time spent waiting for musicbrainz to return a response and for that response to be fully delivered.

The scenario is this: suppose musicbrainz is taking 5 seconds to return a response. I want to make ten requests as fast as possible in separate threads. That means, in ten seconds, I should have made ten requests (one request per second), and at the tenth second, I would have gotten back results from the first five requests. This isn't possible now because _rate_limit puts a mutex around the entire call. it should only be mutexing the part at which the request is sent over the wire to musicbrainz, and the waiting for the response part of it would be outside of the mutex.

zzzeek avatar Aug 21 '16 17:08 zzzeek

a way to fix this would be to not use a mutex around the method at all. just store the last timestamp a request was made. new request comes in, wait in a mutexted loop for the timestamp to grow by at least one second, then invoke the _mb_request() outside of the mutex entirely.

zzzeek avatar Aug 21 '16 17:08 zzzeek

admittedly im not understanding their document fully - they talk about "50 req sec for the python/musicbrainz user agent", is that...for all python/musicbrainz requests from all sources IPs (eg. shared w/ the world)? in which case, OK, I'd be better off not even bothering to use any threads if the service is really that low of a scale. feel free to clarify / close if this is the case.

zzzeek avatar Aug 21 '16 17:08 zzzeek

I think I can at least clarify the 50 requests/sec limit for python-musicbrainz. This is for the older python-musicbrainz2 library, which was used for the older musicbrainz web service prior to the launch of the new one in 2011. Until version 0.7.4 (I think) this didn't support customizing the user agent in any way. Since the web service that library uses is deprecated and a version with a customizable user agent has been released, I'm not sure, but it's been a few years, there's the global limit of 50 requests/sec for the old version.

mineo avatar Aug 21 '16 20:08 mineo

anyway I'm locally doing something like this:

import threading
import time
import musicbrainzngs as mb

# unwrap mb's rate limiting
mb.musicbrainz._mb_request = mb.musicbrainz._mb_request.fun


last_mb_call = None
last_mb_mutex = threading.Lock()

def _mb_fn(self, fn, *arg, **kw):
    global last_mb_call

    last_mb_mutex.acquire()
    try:
        now = time.time()
        if last_mb_call is None:
            last_mb_call = now
        else:
            if now - last_mb_call < 1:
                time.sleep(1 - (now - last_mb_call))
                now = time.time()
            last_mb_call = now
    finally:
        last_mb_mutex.release()

    return fn(*arg, **kw)

# ... later ...

results = _mb_fn(mb.search_recordings, **kwargs)

and this is working great. That is, put the mutex around just the part where you're checking the time. there seem to be a lot more options in _rate_limit() but also right now it's not really possible to set "no" rate limit due to the thread serializing the requests.

zzzeek avatar Aug 21 '16 20:08 zzzeek