pyld
pyld copied to clipboard
Speed up of compare_values and has_value methods
I noticed very slow performance on the to_rdf
procedure for a JSON-LD file with several tens of thousands of typed object values for a single property.
Running cProfiles, it turned out that there was an inordinate amount of type spent in the methods compare_values
and has_value
.
This pull request introduces the following changes:
- Re-implemented
compare_values
with exception handling rather than if-then statements. Also changed the ordering and removed the boolean comparison between primitive values (It may need to return, but I couldn't understand the reason behind it) - Re-implemented the
has_value
method (which calledcompare_values
so very frequently) to perform checks only once, and only compare values of the same type.
There's also a question on the way the has_value
is implemented: it seems that if the value
parameter that's passed is an array, it is completely ignored. Is that the correct behavior?
Thanks, I'll take a look when I get a chance. May need to finish getting the test suite up-to-date so we can check if the changes are ok.
Can you give a brief example of the form of data that was slow? Just one or two properties is fine, not 10000+ :-) I've started setting up a benchmarking system and issues like this are useful target for auto-generating test data inputs.
I'd like to bump this as it fixes major issues we're facing internally, with slow framing on large JSON-LD files.
I will try to generate some artificial data (large amount of data, few properties).
Hi! Any plans to include this? We are also facing issues with framing large JSON-LD files.