python-mapswipe-workers icon indicating copy to clipboard operation
python-mapswipe-workers copied to clipboard

discrepancy between project progress

Open Hagellach37 opened this issue 2 years ago • 2 comments

The project progress calculation reveals different results for different parts of the mapswipe code.

During transfer results we use this function: https://github.com/mapswipe/python-mapswipe-workers/blob/master/mapswipe_workers/mapswipe_workers/firebase_to_postgres/update_data.py#L570

In generate stats we use another function: https://github.com/mapswipe/python-mapswipe-workers/blob/master/mapswipe_workers/mapswipe_workers/generate_stats/project_stats_by_date.py#L8

For the project -NFNr55R_LYJvxP7wmte the transfer results gives us 100% whereas we get 94.74% for the generate stats workflow.

This is the query in transfer results:

select
 avg(group_progress)::integer as progress
 from
        (
            -- Get all groups for this project and
    -- add progress for groups that have been worked on already.
    -- Set progress to 0 if no user has worked on this group.
    -- For groups that no users worked on
    -- there are no entries in the results table.
    select
      g.group_id
      ,g.project_id
      ,case
        when group_progress is null then 0
        else group_progress
      end as group_progress
    from groups g
    left join
        (
        -- Here we get the progress for all groups
        -- for which results have been submitted already.
        -- Progress for a group can be max 100
        -- even if more users than required submitted results.
        -- The verification number of a project is used here.
        select
          ms.group_id
          ,ms.project_id
          ,case
            when count(distinct user_id) >= p.verification_number then 100
            else 100 * count(distinct user_id) / p.verification_number
          end as group_progress
        from mapping_sessions ms, projects p
        where ms.project_id = p.project_id
        group by group_id, ms.project_id, p.verification_number
    ) bar
    on bar.group_id = g.group_id and bar.project_id = g.project_id
    where g.project_id = '-NFNr55R_LYJvxP7wmte'
) foo
group by project_id 

It seems that the code in generate stats is not giving the correct results.

Hagellach37 avatar Nov 28 '22 17:11 Hagellach37

query:

select
  date_trunc('day', start_time) as day
  ,sum(items_count)
from mapping_sessions ms 
where project_id = '-NFNr55R_LYJvxP7wmte'
group by day
order by day

result:

2022-11-18 00:00:00.000	13008
2022-11-19 00:00:00.000	65850
2022-11-20 00:00:00.000	45312
2022-11-21 00:00:00.000	47940
2022-11-22 00:00:00.000	35064
2022-11-23 00:00:00.000	53868
2022-11-24 00:00:00.000	46140
2022-11-25 00:00:00.000	42840
2022-11-26 00:00:00.000	35286
2022-11-27 00:00:00.000	33426
2022-11-28 00:00:00.000	6930

and here the results from the csv file:

day number_of_results
2022-11-18 13008
2022-11-19 62844
2022-11-20 40878
2022-11-21 46362
2022-11-22 31872
2022-11-23 50214
2022-11-24 45612
2022-11-25 41664
2022-11-26 32508
2022-11-27 31686
2022-11-28 5298

It's not yet clear why there are results missing in the second table.

Hagellach37 avatar Nov 28 '22 18:11 Hagellach37

What is special about the first day? There is no discrepancy there.

laurentS avatar Nov 29 '22 10:11 laurentS