anonlink-entity-service
anonlink-entity-service copied to clipboard
Exception while calculating comparisons for multiparty linkage
@wilko77 I noticed your comment that the testing deployment was down and had a look to see what was going on. The anonlink-entity-service v1.11.0
has a traceback in the logs:
[2019-06-11 00:27:09,208: ERROR/ForkPoolWorker-4] Task entityservice.tasks.stats.calculate_comparison_rate[aec9c7b3-2274-48a7-a220-36242ba08f16] raised unexpected: ValueError('expected at most 2 datasets, got 3',)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/var/www/entityservice/tasks/stats.py", line 18, in calculate_comparison_rate
comparisons = get_total_comparisons_for_project(dbinstance, run['project_id'])
File "/var/www/entityservice/database/selections.py", line 249, in get_total_comparisons_for_project
raise ValueError(f'expected at most {expected_datasets} '
ValueError: expected at most 2 datasets, got 3
Looking at get_total_comparisons_for_project
in selections.py
it gets the expected_datasets
from the database (parties
column of the projects
table). Where the number of datasets that were got comes from this query:
SELECT bloomingdata.count as rows
from dataproviders, bloomingdata
where
bloomingdata.dp=dataproviders.id AND dataproviders.project=%s
It seems to me that either a multiparty project is getting created with the projects.parties
value not getting correctly set, or a 2 party linkage is uploading multiple bloomingdata
entries for a single upload.
Looking into the project.parties
value, we see in models/project.py
that the number_parties
is optional and defaults to 2:
# Get optional fields from JSON data
name = data.get('name', '')
notes = data.get('notes', '')
parties = data.get('number_parties', 2)