isodate
isodate copied to clipboard
parse_time() is not threadsafe
I just encountered a bug that was very hard to track down: In a multithreaded environment, isotime.parse_time()
sometimes fails to parse a valid ISO8601 string of the form HH:MM+HH:MM
. In a console (i.e., in a single-threaded setting), the same string is parsed without error.
The problem seems to be in the function build_time_regexps()
that is called at the beginning of parse_time()
: It checks whether the global variable TIME_REGEX_CACHE
is empty, and if so, it fills it with various compiled regexes, one after another. If control passes to another thread after some but not all regexes have been append()
ed to that list, the other thread will find the list to be non-empty and hence assume it contains all the necessary compiled regexes. Hence it will succeed in parsing some time formats and fail for others.
The function isodates.parse_date()
seems to be using the same technique, and so it likely suffers from the same problem. I have not looked at that function in detail though.
This is what's keeping me from using this (otherwise great :+1: ) library as well ... @gweis would you accept a PR that fixes this by wrapping the globals inside a closure (or do you prefer classes)?
The whole reason why this is is in a function is probably due to a not so well thought through use case. Have a look at https://github.com/gweis/isodate/blob/master/src/isodate/isodates.py#L48, where it would be possible to allow more than 4 year digits if desired.
@MikiDi not sure how you intend to fix it by wrapping it into a closure.
I am rather thinking, that initialising the TIME_REGEX_CACHE
and DATE_REGEX_CACHE
on module import would be a simpler solution. Something like pre populate that dictionary on module load, and add a helper methods that can be used on startup (or maybe even triggered with env vars) to add more regexes to it.
That should fix the race condition, and provides a clear entry point for applications to use other formats. The build_date_regexps
cloud be that entry point; a better name wight be nice though.
(open for other ideas)
The way I understand it, you basically want a module that is configurable "on load", i. e., the ability to specify parameters like yeardigits
and expanded
at import time. Since Python does not allow for that, the usual solution would be to have isodates.py
define a class (say, DateParser
) that sets up the cache on object initialization. A less object-oriented and more functional approach would be to provide instead a factory function (something like create_date_parser(yeardigits=4, expanded=False)
that constructs an anonymous function using the given parameters and returns it. I think that is what @MikiDi was referring to by "wrapping it into a closure".
Needless to say, both of these methods would change the API in a non-backwards compatible way. So a better solution might indeed be to set up the cache on import and allow for changing it later. This would mean that people who want to use the extended functionality have to jump through one more hoop.
Alternatively, you could use a threading.Lock
around the cache creation. This would mean the least amount of change for the API (and be the most explicit one), so in your position I would probably take that route.