arches
arches copied to clipboard
Indexing records with large number of tiles (containing domain datatypes) takes excessively long time
Describe the bug Indexing a resource with a large number of tiles (predominantly containing domain value datatypes) takes an excessive amount of time. Reindexing one record with 3,900 tiles took over eight hours.
To Reproduce Steps to reproduce the behavior:
- Create record with very large number of tiles (predominantly containing domain value datatypes).
- Reindex record.
Screenshots If applicable, add screenshots to help explain your problem.
Expected behavior Comparable reindexing times between similarly sized records whether they contain Domain/DomainList Data types or not.
Your Arches Information
- Version used:
- Operating System and version (desktop or mobile):
- Browser Name and version:
- Link to your Arches Install (optional):
Additional context
There seems to be iterations in the datatypes.py append_to_document process for DomainDatatype and DomainListDatatype that are no longer required.
Work is undertaken to deduce the Nodeid/NodeValue, however there has been subsequent refactoring of the code, and these are now supplied as arguments to the procedures.
If this unnecessary work is refactored out then the reindexing completes in a more expected timeframe.
Prima facie - potential fix changes shown below (REMming out unnecessary code, refactoring to use parameter..)
class DomainDataType(BaseDomainDataType):
def append_to_document(self, document, nodevalue, nodeid, tile, provisional=False):
# domain_text = None
# for tile in document["tiles"]:
# for k, v in tile.data.items():
# if v == nodevalue:
# node = models.Node.objects.get(nodeid=k)
# domain_text = self.get_option_text(node, v)
node = models.Node.objects.get(nodeid=nodeid)
domain_text = self.get_option_text(node, nodevalue)
if domain_text not in document["strings"] and domain_text is not None:
document["strings"].append({"string": domain_text, "nodegroup_id": tile.nodegroup_id, "provisional": provisional})
class DomainListDataType(BaseDomainDataType):
def append_to_document(self, document, nodevalue, nodeid, tile, provisional=False):
domain_text_values = set([])
# for tile in document["tiles"]:
# for k, v in tile.data.items():
# if v == nodevalue:
node = models.Node.objects.get(nodeid=nodeid)
for value in nodevalue:
text_value = self.get_option_text(node, value)
domain_text_values.add(text_value)
for value in domain_text_values:
if value not in document["strings"]:
document["strings"].append({"string": value, "nodegroup_id": tile.nodegroup_id, "provisional": provisional})
Ticket Background
- Found by: @khodgkinson-he
@chiatt do you think that @khodgkinson-he comments about these simply need refactoring are sound? If so then we can get this done and PR'd into dev/6.1
@aj-he Yeah, @khodgkinson-he's proposed changes look good and dev/6.1 seems like the right place.