center-randomize
center-randomize copied to clipboard
Great effort! Here are some edge cases you might need to consider in future.
Hi, this is a great effort to have an open sourced algorithm for doing things like this. I hope you continue to open source any other problems/solutions so that more people might be able to take a look at them.
On a quick look, I didn't find any big issue. I am just going to mention some edge cases that I thought of while going through the code. You may need to account for these situations (in future) if you haven't already done so. Piloting this in Kathmandu, itself shouldn't be problematic but as you expand this in future these cases might occur more frequently. Case 3. and 4. might be applicable in Kathmandu as well.
-
Distance calculation with latitude, longitude doesn't always equal to distance students need to travel by road. Example: For places in remote areas, where there is no accessible bridge available and the center happens to be on the other side of river, closest by km might not be easily accessible. Maybe this can be handled by setting up prefs code?
-
If no centers are within the absolute distance threshold, it always chooses the same center(the closest one) Example: If school A doesn't have any center within threshold (7km). But has 5 centers near threshold but slightly different distance like 7.1, 7.2 km etc. This doesn't choose different center every year. In this case, there is no check for PREF_CUTOFF either. So even if it was problematic last year, same center will still be chosen.
-
Not sure if its possible for students to be not from any school(eg: homeschool). In which case they might need special center allocation.
-
Centers assignment for students with special needs might need to be handled separately.
I have a simple solution idea for CASE-2.
Existing Logic
- Searches for centers within the
PREF_DISTANCE_THRESHOLD
- If doesn't find a center within the
PREF_DISTANCE_THRESHOLD
, then select the closest center.
New Logic
- Searches for centers within the
PREF_DISTANCE_THRESHOLD
- If doesn't find a center within the
PREF_DISTANCE_THRESHOLD
, increase the threshold by a certain amount (x
) and rerun until centers are allotted for all students.
What does it solve?
-
But has 5 centers near threshold but slightly different distance like 7.1, 7.2 km etc. This doesn't choose different center every year
This is solved as the value of
x
is to be kept larger probably approximate to thePREF_DISTANCE_THRESHOLD
-
In this case, there is no check for PREF_CUTOFF either. So even if it was problematic last year, the same center will still be chosen.
This is solved as instead of taking the closest center we are taking
centers_within_distance
which properly filters thePREF_CUTOFF
. In some cases it might not be good to remove the closest center from thecenters_for_school
options as the centers other than this may be far and hence inconvenient for students. In this case, thePREF
value needs to be increased by solving the problem.
I have created a pull requests that might solve CASE 1.
Considered using route distance instead of direct Haversine Distance. Considers the best possible route distance using maps api
Here is the PR #67
this is a useful discussion. want to add some context - there will always be somethings that we cannot model, one-off cases that need to be handled in special way. therefore there will be manual oversight by NEB staff on the list that is generated by our script. case 3, 4 will be handled during this step. few interesting cases -
- there is a school inside tripureshwor jail where children of the inmates study. Their exam center will always be the same school. This school was excluded from the dataset.
- Students of Khagendra navajeevan school have special needs and they were re-assigned to suitable centers. Cases like these are best handled by NEB staff who have ground experience on these matters. we should only seek to automate things that are time consuming, error prone or possibility of introducing bias.
CASE 1: @ArunShresthaa's #67 is more precise but has external dependencies and cost implication, cost to benefit does not make sense at this point. Interested can explore other options to see if there is a better alternative but haversine distance is quick and good enough approximation.
Case 2: will be dealt in #31
Closing this ticket