swipl-devel
swipl-devel copied to clipboard
Calling nb_current with undefined global invokes user:exception hook
According to the nb_current/2documentation:
"Note that nb_current/2 can be used as an alternative for nb_getval/2 to request the value of a variable and fail silently if the variable does not exists."
This isn't quite true - it will call any instances of user:exception with an Exception value of undefined_global_variable and only fail when all of these fail. This means that the performance of an existence test, e.g., to lookup a cached value, is dependent on the arbitrary behaviour of loaded modules.
The suggestion is that nb_current/2 should not call this hook, i.e., silently fail as advertised. Failing that, can an efficient primitive to test the existence of a global (or an equivalent option for nb_current) be added.
Just some performance numbers:
Welcome to SWI-Prolog (threaded, 64 bits, version 8.3.25)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.
For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).
?- nb_setval('$$$',test).
true.
?- time((between(1,1000000,_),nb_current('$$$',V),fail)).
% 2,000,004 inferences, 0.088 CPU in 0.089 seconds (99% CPU, 22789731 Lips)
false.
?- time((between(1,1000000,_),nb_current('???',V),fail)).
% 3,000,001 inferences, 1.491 CPU in 1.909 seconds (78% CPU, 2011834 Lips)
false.
And this doesn't include any user:exception hooks. Also note low CPU utilization (MacOS 10.13.6).
This is a good old problem. Changing nb_current/2 is dubious and would break too much. Pushed 704c8c9438a8409775624d6e1888e3e9f6d350c8 to address this. This patch ensures the hook is tried only once per variable (per thread).
This issue has been mentioned on SWI-Prolog. There might be relevant details there:
https://swi-prolog.discourse.group/t/backtracking-global-variables/4234/1
I suspected changing nb_current/2 would be problematical but, if I understand your PR, this change only helps if there are multiple attempts to read the same undefined global variable. To be honest, I can't think of a scenario where that's the case.
In this particular application, nb_current/2 is only called once per undefined global; if that fails the global will be calculated and set. So the key to good performance is fast first failure. As noted above, the performance on failure is over a factor of ten worse than the performance on success, independent of whether any user_exception is defined.
If a change to nb_current/2 isn't feasible for NUC reasons, perhaps a (deterministic) primitive based on the function gvar_value in pl-gvar.c could be added.
This suggests you are using zillions of global variables. That is anyway bad for performance due to the required preparation and fixup per variable that is required for garbage collection. Typically the solution is to have a global variable with e.g. an array as content. What is the scenario?
Nope, see my second post in this issue: 1 global var ('$$$'). Accessing it using nb_current takes less than 0.1 microseconds including backtracking loop overhead. Same call on a non-existant gloabl ('???') takes almost 1.5 microseconds. (In addition, the CPU usage drops to 72% which suggests some OS interaction might be involved.)
Briefly, the bigger picture is to cache the result of a grammar rule within a single parse operation (packrat parser). So I'd expect perhaps a couple thousand such entries, but is quite dependent on the grammar and the size of the input. In some cases there may be less than 100 globals. And they're incrementally created as the parsing proceeds. Not sure how an array approach can be helpful for this scenario.
I thought tabling might be used to do something like this, but it didn't work out (see https://swi-prolog.discourse.group/t/tabling-the-wolf-sheep-cabbage/1893/27?u=ridgeworks).
Then I do not understand. You get one call to exception/3 for the undefined ??? var and thus
101 ?- time((between(1,1000000,_),nb_current('???',V),fail)).
% 2,000,005 inferences, 0.054 CPU in 0.054 seconds (100% CPU, 36857893 Lips)
false.
A couple of 100s won't make a huge difference.
B.t.w. although I do recognise this problem, I get far better numbers on the version before the above patch (Ubuntu 20.04). The not 100% CPU is rather strange. There is no OS involvement AFAIK.
101 ?- time((between(1,1000000,_),nb_current('$$$',V),fail)).
% 2,000,004 inferences, 0.106 CPU in 0.106 seconds (100% CPU, 18860817 Lips)
false.
102 ?- time((between(1,1000000,_),nb_current('???',V),fail)).
% 3,000,001 inferences, 0.201 CPU in 0.201 seconds (100% CPU, 14920295 Lips)
false.
"Then I do not understand. You get one call to exception/3 for the undefined ??? var and thus..."
In my case (no patch) there is a potential call to exception for all 1,000,000, but there is no user:exception defined. From your results, It looks like the patch improves performance by a factor of 4, which is what I would expect.
The patch will drastically affect the simple benchmark above but doesn't help much on a caching use case where the failure of nb_current is immediately followed by setting its value, e.g.,:
memo_generate(Key,Value) :-
( nb_current(Key,Value)
-> true
; generate(Key,Value),
nb_setval(Key,Value)
).
What I can't explain is the difference between Ubuntu and MacOS. Other than nb_current for undefined globals, the other global var predicates seem to exhibit comparable performance. I think the CPU utilization is pretty suspicious indicating something else is at play. (The results are reproducible on both the MacOS bundle and command line version.)
So it looks like this is boiling down to a MacOS dependent problem. I probably wouldn't have opened this issue if the performance for this particular use case was comparable to Ubuntu. That said, I'm far from convinced that using an exception to initialize global variables is wise. At the very least there ought to be a "mean and lean" alternative that just fails on non-existant global vars.
I have little clue on the MacOS performance issue. I'm not familiar with the MacOS tools to figure this out. If performance is important you better run Linux :smile: Anyway, as is, caching using global variables scales poorly because the way global variables are handled by the garbage collector scales poorly. Possibly this can be fixed, but not easily. Using a trie might be a good alternative?
Lazy initialization of global variables has proven to be almost necessary to make libraries that depend on them work in a multi-threaded context. I'm not totally against some way to check for the existence of a global variable that bypasses the hook if there is a compelling reason to do so.
I don't get to pick a user's platform of choice. So if a given "feature" is unacceptable for an intended purpose on one platform, it's unusable on all IMO.
In this particular case, a primitive that shouldn't have an OS dependencies (unlike, say, threads or IO) , in fact has one, albeit, performance related. Since the root cause is unknown, it is possible that other primitives are similarly affected, and it may even exist on other platforms - I haven't tested them all.
Anyway, as is, caching using global variables scales poorly because the way global variables are handled by the garbage collector scales poorly.
What do you mean by "scales poorly"; can you quantify? Perhaps this should be documented somewhere.
Using a trie might be a good alternative?
Good suggestion. I'll definitely look into this, assuming it doesn't have the same problems as global vars.
Lazy initialization of global variables has proven to be almost necessary to make libraries that depend on them work in a multi-threaded context.
Since global vars are local to a thread, i.e., unaccessible to other threads, I don't understand this. Perhaps you're referring to truly global vars which are shared across threads, but I'm not even sure how lazy evaluation solves that problem. (Certainly all my use cases are local to a module, even if that's unenforcible. ) And can't lazy evaluation always be done in the library code; why does it have to be built-in to the primitives?
I'm not totally against some way to check for the existence of a global variable that bypasses the hook if there is a compelling reason to do so.
If nb_current wasn't deficient in a platform specific way, this wouldn't be needed. And if tries are actually a better solution, I guess my "compelling reason" is no longer valid.
scales poorly
GC cannot handle the way global vars are represented and is it rather hard to fix that. As a result we must create a pure Prolog representation for them before GC and restore the global vars after GC. Well, that is for variables that have a value that is represented on the stacks (anything but atoms and small integers). We keep track of the number of glbal vars that need this threadment and if there is at least one we process the entire table.
unaccessible to other threads, I don't understand this
May applications depend on initialized global variables. If we don't initialize them lazily we need to initialize than as the thread starts while it is likely the new thread will never use them or we need some explicit call to prepare the thread to use some library, which isn't very neat either.
Anyway, if tries have the right semantics for you they are likely to work for you. A simple dynamic predicate may work too. It all depends on the workload and required semantics though.
...We keep track of the number of glbal vars that need this threadment and if there is at least one we process the entire table.
I think I see what's going on. My main use for globals is operational measurements (statistics), so that's mainly counters of various types (small ints), so they should be fine. However compound terms (including lists), strings, etc. will have this issue. Probably OK if the number of globals is quite small, but definitely not good for caching where you may have thousands.
For my particular caching application, I can probably workaround this by deleting all the globals at the end of the parse operation so the probability that GC will encounter these globals is small.
If we don't initialize them lazily we need to initialize than as the thread starts ...
Ah, I see the problem now. In fact I ran into this before, but totally forgot about it - my bad. Even if the globals were private to a module, initializing the globals as part of the module initialization is insufficient - you need to repeat this initialization for each new thread that uses the module - so a per module (library), per thread init.
Anyway, if tries have the right semantics for you they are likely to work for you. A simple dynamic predicate may work too.
Tries are the best solution I've found so far. Lookups (both successful and unsuccessful) seem to be very cheap; inserts are more expensive, particularly as the trie gets large (1500 or so). Each parse operation gets a new trie so it should be thread-safe. (Tries are private local databases unless you export them.) I don't think dynamic predicates have the right semantics since they're truly global.
I assume tries use the same indexing strategy as clause heads. My keys are of the form (String+,Integer+) so one would think that's a good fit. My caching tries have a very short lifespan. Does it make sense to destroy the trie when it's no longer needed or just rely on GC to clean it all up?
I apologize for taking this "issue" off-topic; will post any further questions regarding tries on discourse.