anonlink-entity-service icon indicating copy to clipboard operation
anonlink-entity-service copied to clipboard

Exception while calculating comparisons for multiparty linkage

Open hardbyte opened this issue 5 years ago • 0 comments

@wilko77 I noticed your comment that the testing deployment was down and had a look to see what was going on. The anonlink-entity-service v1.11.0 has a traceback in the logs:

 [2019-06-11 00:27:09,208: ERROR/ForkPoolWorker-4] Task entityservice.tasks.stats.calculate_comparison_rate[aec9c7b3-2274-48a7-a220-36242ba08f16] raised unexpected: ValueError('expected at most 2 datasets, got 3',)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/var/www/entityservice/tasks/stats.py", line 18, in calculate_comparison_rate
    comparisons = get_total_comparisons_for_project(dbinstance, run['project_id'])
  File "/var/www/entityservice/database/selections.py", line 249, in get_total_comparisons_for_project
    raise ValueError(f'expected at most {expected_datasets} '
ValueError: expected at most 2 datasets, got 3 

Looking at get_total_comparisons_for_project in selections.py it gets the expected_datasets from the database (parties column of the projects table). Where the number of datasets that were got comes from this query:

SELECT bloomingdata.count as rows
from dataproviders, bloomingdata
where
    bloomingdata.dp=dataproviders.id AND dataproviders.project=%s

It seems to me that either a multiparty project is getting created with the projects.parties value not getting correctly set, or a 2 party linkage is uploading multiple bloomingdata entries for a single upload.

Looking into the project.parties value, we see in models/project.py that the number_parties is optional and defaults to 2:

        # Get optional fields from JSON data
        name = data.get('name', '')
        notes = data.get('notes', '')
        parties = data.get('number_parties', 2)

hardbyte avatar Jun 23 '19 07:06 hardbyte