zimfarm icon indicating copy to clipboard operation
zimfarm copied to clipboard

Schedule column always empty under certain conditions

Open kelson42 opened this issue 3 years ago • 1 comments

Recipe view looks good with pagination at 10 A123B034-FB26-4D87-B05A-332077BA2A8A

But with pagination at 50, the schedule column is empty 51A8F65F-7DF9-4496-9405-40CD13856C23

kelson42 avatar Oct 07 '22 20:10 kelson42

Good catch! It's due to incorrect handling of the limit (your filter has less entries than the limit)

rgaudin avatar Oct 10 '22 07:10 rgaudin

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar May 26 '23 18:05 stale[bot]

Isn't this already fixed? See https://farm.openzim.org/recipes?category=ifixit for instance.

benoit74 avatar Sep 28 '23 19:09 benoit74

No it's not. @benoit74 we're looking at the column with a clock that contains a tick if the schedule is requested (has a requested-task).

ATM, there are only wikihow recipes in the pipe Screenshot 2023-09-29 at 08 06 51

In the recipes list, with a pagination of 20, I do get the ticks Screenshot 2023-09-29 at 08 07 04

But if I change to 50, the ticks are gone Screenshot_2023-09-29_at_08_07_12

rgaudin avatar Sep 29 '23 08:09 rgaudin

This is linked to an underlying HTTP error at UWSGI level. By default UWSGI limit the request size to 4096 bytes.

When there is a lot of schedules, the UI does a pretty big query to retrieve all requested tasks due to the addition of all schedule names: https://api.farm.openzim.org/v1/requested-tasks/?limit=50&schedule_name=wikihow_ar_maxi&schedule_name=wikihow_cs_maxi&schedule_name=wikihow_de_maxi&schedule_name=wikihow_en_endless&schedule_name=wikihow_en_endless_arts-and-entertainment&schedule_name=wikihow_en_endless_cars-and-other-vehicles&schedule_name=wikihow_en_endless_computers-and-electronics&schedule_name=wikihow_en_endless_education-and-communications&schedule_name=wikihow_en_endless_food-and-entertaining&schedule_name=wikihow_en_endless_hobbies-and-crafts&schedule_name=wikihow_en_endless_holidays-and-traditions&schedule_name=wikihow_en_endless_home-and-garden&schedule_name=wikihow_en_endless_personal-care-and-style&schedule_name=wikihow_en_endless_sports-and-fitness&schedule_name=wikihow_en_endless_work-world&schedule_name=wikihow_en_endless_youth&schedule_name=wikihow_en_maxi&schedule_name=wikihow_es_maxi&schedule_name=wikihow_fa_maxi&schedule_name=wikihow_fr_maxi&schedule_name=wikihow_hi_maxi&schedule_name=wikihow_id_maxi&schedule_name=wikihow_it_maxi&schedule_name=wikihow_ja_maxi&schedule_name=wikihow_ko_maxi&schedule_name=wikihow_nl_maxi&schedule_name=wikihow_pt_maxi&schedule_name=wikihow_ru_maxi&schedule_name=wikihow_th_maxi&schedule_name=wikihow_tr_maxi&schedule_name=wikihow_vi_maxi&schedule_name=wikihow_zh_maxi

I see three options:

  1. adapt the /schedules endpoint to also return the info about the presence of a requested_task: a boolean property task_requested
  2. in the UI, get all requested-tasks to populate this view and filter in the UI ; use a default pagination of 100 items and a loop to fetch all requested-tasks ; do not keep the list of requested tasks in memory, only the info which interests us: is there a requested task for the given schedule ; the list of requested-tasks is expected to always be small + filtering them on the fly should be ok
  3. adapt UWSGI configuration to increase maximum request size

I prefer option 1 because it is the more straightforward solution ; performance impact is probably minimal because when have an index at hand + the number of requested tasks is expected to be small ; the endpoint is then maybe a bit less "RESTfull" but who cares?

I don't like option 3 because this is the kind of setting which has been chosen for good reasons, and changing it just for a "not-adapted" HTTP endpoint is not a good practice IMHO.

benoit74 avatar Sep 29 '23 09:09 benoit74

If we choose option 1, we should probably adapt GET /schedules and GET /schedules/{schedule_name} for consistency

benoit74 avatar Sep 29 '23 09:09 benoit74

I agree changing it in uwsgi would not be wise. 4KiB is a lot and browsers may strip it anyway.

I am also in favor of 1. is_requested: bool is a schedule-related info and data model should not dictate the API. It's semantically different from requested_task_id for instance.

rgaudin avatar Sep 29 '23 10:09 rgaudin