pyld icon indicating copy to clipboard operation
pyld copied to clipboard

Speed up of compare_values and has_value methods

Open RinkeHoekstra opened this issue 5 years ago • 3 comments

I noticed very slow performance on the to_rdf procedure for a JSON-LD file with several tens of thousands of typed object values for a single property.

Running cProfiles, it turned out that there was an inordinate amount of type spent in the methods compare_values and has_value.

This pull request introduces the following changes:

  • Re-implemented compare_values with exception handling rather than if-then statements. Also changed the ordering and removed the boolean comparison between primitive values (It may need to return, but I couldn't understand the reason behind it)
  • Re-implemented the has_value method (which called compare_values so very frequently) to perform checks only once, and only compare values of the same type.

There's also a question on the way the has_value is implemented: it seems that if the value parameter that's passed is an array, it is completely ignored. Is that the correct behavior?

RinkeHoekstra avatar Nov 06 '19 15:11 RinkeHoekstra

Thanks, I'll take a look when I get a chance. May need to finish getting the test suite up-to-date so we can check if the changes are ok.

Can you give a brief example of the form of data that was slow? Just one or two properties is fine, not 10000+ :-) I've started setting up a benchmarking system and issues like this are useful target for auto-generating test data inputs.

davidlehn avatar Nov 06 '19 19:11 davidlehn

I'd like to bump this as it fixes major issues we're facing internally, with slow framing on large JSON-LD files.

I will try to generate some artificial data (large amount of data, few properties).

RinkeHoekstra avatar Oct 22 '20 07:10 RinkeHoekstra

Hi! Any plans to include this? We are also facing issues with framing large JSON-LD files.

dvsrepo avatar Dec 03 '20 09:12 dvsrepo