cowin
cowin copied to clipboard
[Design Change] Separating filtering and api calls
Currently the code to apply min age filter runs like this
curr_result = filter_centers_by_age_limit(self._call_api(url),
min_age_limt)
The potential problem with this is that the filter is coupled with the api call, hence making changes or places where the code is branched like the following case
if min_age_limt:
curr_result = filter_centers_by_age_limit(self._call_api(url),
min_age_limt)
else:
curr_result = self._call_api(url)
becomes difficult and has chances of introduction of errors.
Proposal: change the structure so that the code for filter only works on the received data and does not need to make the api call. The fetch is going to happen first anyways since the API itself not going to apply any filter and return all the data always (at least for now). The code would then look like
curr_result = self._call_api(url)
if min_age_limt:
curr_result = filter_centers_by_age_limit(curr_result, min_age_limt)
if some_other_filter:
curr_result = some_other_filter_function(curr_result, *kwargs)
Please let me know if this is something that can make the code better, and I will make the changes in a separate PR.
Hey @DragonWarrior15 , I was thinking something similar as I could see that it's not as extensible as I thought it would be to include other filters. I didn't think it all the way through previously. The direction in which you are thinking, it makes sense.
Do you want try out the change and raise a PR?
You could also think on how to add filters for the following: Minimum Age Vaccine Type Vaccine Cost
Also, this would be a part of the v1.0.0 release this might potentially be backwards incompatible. I have a few more changes for the release, I'll create a separate issue for the same tonight.
Let me know if I could help with anything.
@backtrackbaba, thanks for the encouraging response ! I will give this an attempt right away. The filters you have mentioned can be put as part of the same function get_availability_by_district
without losing backwards compatibility. The actual filters will be inside the function and not exposed to the user, so we can ensure compatibility. Anyways, let me raise a PR to continue the discussion.
I'm going through the changes and have some thoughts. Give me some time I'll get back to you with them
Hey @DragonWarrior15 ,
Here are my thoughts:
I saw here that you were trying to verify the data type for the results obtained from the API call. Wanted to know your views on it?
The filters could be passed as a dictionary with all the keys using which you want to filter. There could also be default filters that would basically have all the options present in it so that if a user doesn't pass any filters they get the data as it is.
Example of default filters:
default_filters = {
'min_age_limit': 18,
'fee_type': ['Free', 'Paid'],
'vaccine': ['COVISHIELD', 'COVAXIN'],
'available_capacity': 1,
}
Example of user-defined filters:
user_filters = {
'min_age_limit': 18,
'vaccine': ['COVAXIN'],
}
Now, we could get the final filter by updating the default filters with the user-defined filters thereby, all the keys would be present with the default values if no new values were given.
Example of final filter based on the filters above:
final_filters = {
'min_age_limit': 18,
'fee_type': ['Free', 'Paid'],
'vaccine': ['COVAXIN'],
'available_capacity': 1,
}
Here, I could see that you have defined loop levels 1 and 2 and are filtering based on both in sequence. As there is only 1 element that you have in loop level 1, you could also try to simply use the built-in filter method to filter out for level 1.
As for level 2, you could chain a bunch of and conditions based on the final filters wherein you don't need to do any checks as you know surely the filters would have the values for all the conditions (either user-defined or default). Using this, all the changes would be contained in the filter_centers
methods in the util itself.
Let me know your thoughts
I saw here that you were trying to verify the data type for the results obtained from the API call. Wanted to know your views on it?
- the data checks are on the filter values that are passed for the different keys, they don't check the API results at all
The filters could be passed as a dictionary with all the keys using which you want to filter. There could also be default filters that would basically have all the options present in it so that if a user doesn't pass any filters they get the data as it is.
Example of default filters:
default_filters = { 'min_age_limit': 18, 'fee_type': ['Free', 'Paid'], 'vaccine': ['COVISHIELD', 'COVAXIN'], 'available_capacity': 1, }
- The default should be to run no filter.
- In the dictionary you have defined above, we are making certain assumptions about the values present in the different keys. For instance,
vaccine
list could be expanded anytime and our package may fail. - I am anyways passing
None
as the default values for the filter, and in the code I check whichever keys are having filter value asNone
, those keys are not checked for filters. - However, I do see that we should allow a list of values as filters, not sure how that will complicate the current implementation.
Here, I could see that you have defined loop levels 1 and 2 and are filtering based on both in sequence. As there is only 1 element that you have in loop level 1, you could also try to simply use the built-in filter method to filter out for level 1.
- Level 1 could have new keys later, and we will not need to change the code to accommodate new filters, just modifying the
Constants
will suffice. - Also, the way I have defined the
Constants
class, using built-in filter method may complicate things, not entirely sure though.
As for level 2, you could chain a bunch of and conditions based on the final filters wherein you don't need to do any checks as you know surely the filters would have the values for all the conditions (either user-defined or default). Using this, all the changes would be contained in the
filter_centers
methods in the util itself.
- As I mentioned above, inside the loop that actually does the filter, any keys that are not in the filter are simply not checked for. All the keys with
None
as the filter value have already been removed. - There is a problem with the approach you have mentioned: In certain cases we are checking an inequality, while in others an equality. Separating out the dictionary with the comparisons simply makes the method easily extensible to new keys in the future.
Honestly, I do think that your suggestions will help simplify the code but I don't really grasp the changes required to do that due to zero exposure software development and related paradigms. If they are clear for you, I suggest you can borrow the code from my branch and restructure it as works best; just drop a note here.
I will add a commit to at least pass a single dictionary instead of two lists of filter keys and values.
@DragonWarrior15 , I've created PR https://github.com/backtrackbaba/cowin/pull/17
The tests haven't been added. Let me know your thoughts.
Here's a sample program with which I tested:
from cowin_api import CoWinAPI
cowin = CoWinAPI()
availability = cowin.get_availability_by_district("395").get('centers')
print(f"Original Centers Count: {len(availability)}")
filters = {
"vaccine": ["COVISHIELD"],
"fee_type": ["Paid"],
"min_age_limit": 45
}
filtered_centers = cowin.get_availability_by_district("395", filters=filters).get('centers')
print(f"Filtered Centers Count: {len(filtered_centers)}")
Yep this looks good. We will have to hard code the keys either like you have done inside the filter function, or in the constants file like I have done. This looks cleaner I guess.
I looked at your code again and have a few points
- the default filter should have
'min_age_limit': [18,45]
- There should not be a default filter at all:
- i.e., the default filter is to not apply any filter
- The reason being that we will be making assumptions about the values available for those keys. Suppose some day new vaccines are added, more age buckets are added etc, we may or may not get to know about those quickly
- Hence, its better to work without making any assumption about the values in the data
- If you agree with this, you will see that the PR I had raised was designed with this exact configuration in mind, to dynamically handle any number of filters/no filters at all
- All filters should search in a list except the
available_capacity
- I have made some more changes to my PR #15 to make it more pythonic per your suggestion. Please do consider it before making a decision.
- I have more code in my function simply because the default filter for me is no filter, and I also need to write a couple of checks to maintain compatibility with any older versions (due to introduction of lists in filters)
- With the ability to pass single values or lists in the function, I had to remove the type checking I was doing since it would have become quite complicated.
- Added one more test, and all tests are passed.