fundus
fundus copied to clipboard
Deprecate `get_value_by_key_path` and replace with `xpath_search`
This PR deprecates get_value_by_key_path class method of LinkedDataMapping and replaces all existing occasions with xpath_search.
To talk about efficiency I provide some additional runtime comparison and a script to reproduce.
The script loads one article for The West Australian, since it by far includes the largest LD and loads a specific value (or key path) 100000 times. For more details click on Script.
Script
from timeit import timeit
setup = '''
from timeit import timeit
from fundus import PublisherCollection, Crawler
from lxml.etree import XPath
test_xpath = XPath("NewsArticle/datePublished")
crawler = Crawler(PublisherCollection.au.WestAustralian)
for article in crawler.crawl(max_articles=1):
ld = article.ld
'''
stmt1 = 'ld.get_value_by_key_path(["NewsArticle", "datePublished"])'
stmt2 = 'ld.xpath_search("NewsArticle/datePublished")'
stmt3 = 'ld.xpath_search(test_xpath)'
length = 100000
time1 = timeit(setup=setup, stmt=stmt1, number=length)
time2 = timeit(setup=setup, stmt=stmt2, number=length)
time3 = timeit(setup=setup, stmt=stmt3, number=length)
print(f"Loading a value {length} times")
print(f"key_path took {time1:.2f} seconds, with an avg of {time1/length:.2e}")
print(f"xpath_search without pre-compiled XPath took {time2:.2f} seconds, with an avg. of {time2/length:.2e}")
print(f"xpath_search with pre-compiled XPath took {time3:.2f} seconds, with an avg. of {time3/length:.2e}")
print(f"using pre compiled xpath is {time2/time3:.2f} times faster")
print(f"using key_path is {time3/time1:.2f} times faster than xpath_search")
Output
Loading a value 100000 times
key_path took 0.07 seconds, with an avg of 6.98e-07
xpath_search without pre-compiled XPath took 2.42 seconds, with an avg. of 2.42e-05
xpath_search with pre-compiled XPath took 1.51 seconds, with an avg. of 1.51e-05
using pre-compiled XPath is 1.60 times faster
using key_path is 21.65 times faster than xpath_search
For better typing, this PR adds a scalar parameter to xpath_search, indicating that one expects an optional scalar value in return.