hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Passing multiple inputs (scalar and columns) to the parametrized_input decorator

Open latlan1 opened this issue 4 years ago • 5 comments

Is your feature request related to a problem? Please describe. I would like to pass multiple inputs (scalar and columns) to the parametrized_input decorator.

Describe the solution you'd like I would like to create a series of rules for different columns with all parameters in one place for easier management and readability. Example of a rule is whether any values in the column has exceeded a threshold value, however this could be extended to include any numeric comparisons such as <,>, = or combinations thereof e.g. between value_1 & value_2 (inclusive).

import pandas as pd
from hamilton.function_modifiers import parametrized_input

INV_PARAMS = {
     #input var        (# output var,   # threshold_value          ,  # description of new outputs)
     'inventory', ('inventory_geq_1000', 1000, 'inventory greater than or equal to 1000'),
     'inventory', ('inventory_leq_215', 215, 'inventory less than or equal to 215'),
}
          
@parametrized_input(parameter='inventory', assigned_inputs=INV_PARAMS )
def inventory_geq(inventory: pd.Series) -> pd.Series:
    pass

Thanks so much!

latlan1 avatar Nov 16 '21 15:11 latlan1

So, I think this makes sense. The question is whether you want to do this just on the inventory column, or on a bunch of different columns (E.G. different types of inventory). If you do this on just the inventory column, you could do something like this:

INV_PARAMS = {
    ('inventory_geq_1000', 'inventory greater than or equal to 1000') : 1000,
    ('inventory_leq_215', 'inventory less than or equal to 215'): 215,
}

@parametrized(parameter=threshold, assigned_output=INV_PARAMS)
def inventory_geq(inventory: pd.Series, threshold: int): 
    return inventory[inventory > threshold)

This works if you have one column inventory that you want to filter. If you have multiple columns, then you could create the above for each one (a little verbose, but there's something nice about being specific/making things readable), or we could look at combining @parametrized with @parametrized_input as you suggested. Am I understanding you correctly?

elijahbenizzy avatar Nov 17 '21 05:11 elijahbenizzy

Thanks for clarifying my example. I agree that it is a bit verbose but definitely gets the job done.

Can I naturally extend this example to use 2 threshold values by passing an array of threshold values instead of just one int? This would be good for identifying column values between the two threshold values.

On Wed, Nov 17, 2021 at 12:12 AM Elijah ben Izzy @.***> wrote:

So, I think this makes sense. The question is whether you want to do this just on the inventory column, or on a bunch of different columns (E.G. different types of inventory). If you do this on just the inventory column, you could do something like this:

INV_PARAMS = { ('inventory_geq_1000', 'inventory greater than or equal to 1000') : 1000, ('inventory_leq_215', 'inventory less than or equal to 215'): 215, } @parametrized(parameter=threshold, assigned_output=INV_PARAMS)def inventory_geq(inventory: pd.Series, threshold: int): return inventory[inventory > threshold)

This works if you have one column inventory that you want to filter. If you have multiple columns, then you could create the above for each one (a little verbose, but there's something nice about being specific/making things readable), or we could look at combining @parametrized with @parametrized_input as you suggested. Does this make sense?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stitchfix/hamilton/issues/25#issuecomment-971197943, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB46N5MJDZJVFDFBR4IMS53UMM2UDANCNFSM5IEUDUOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Regards,

Lorre Atlan, PhD

latlan1 avatar Nov 17 '21 12:11 latlan1

Thanks for clarifying my example. I agree that it is a bit verbose but definitely gets the job done. Can I naturally extend this example to use 2 threshold values by passing an array of threshold values instead of just one int? This would be good for identifying column values between the two threshold values. On Wed, Nov 17, 2021 at 12:12 AM Elijah ben Izzy @.***> wrote: So, I think this makes sense. The question is whether you want to do this just on the inventory column, or on a bunch of different columns (E.G. different types of inventory). If you do this on just the inventory column, you could do something like this: INV_PARAMS = { ('inventory_geq_1000', 'inventory greater than or equal to 1000') : 1000, ('inventory_leq_215', 'inventory less than or equal to 215'): 215, } @parametrized(parameter=threshold, assigned_output=INV_PARAMS)def inventory_geq(inventory: pd.Series, threshold: int): return inventory[inventory > threshold) This works if you have one column inventory that you want to filter. If you have multiple columns, then you could create the above for each one (a little verbose, but there's something nice about being specific/making things readable), or we could look at combining @parametrized with @parametrized_input as you suggested. Does this make sense? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB46N5MJDZJVFDFBR4IMS53UMM2UDANCNFSM5IEUDUOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. -- Regards, Lorre Atlan, PhD

Yep you can! We're not opinionated on the datatypes at all actually, except in the cases in which we've made it easier to use pandas dataframes. Config items can be anything.

elijahbenizzy avatar Nov 17 '21 18:11 elijahbenizzy

@elijahbenizzy should this issue be closed? Or an updated added?

skrawcz avatar Aug 22 '22 23:08 skrawcz

@skrawcz yeah, I think so. @latlan1, would love to get a sense of how the new changes to @parameterize woudl help you: https://github.com/stitchfix/hamilton/blob/main/decorators.md#parameterize.

elijahbenizzy avatar Aug 25 '22 16:08 elijahbenizzy

OK, so I"m pretty sure this is now solved! @parameterize is far more sophisticated. @latlan1 let me know if there's something that hasn't been covered and we can reopen, otherwise I'm closing this!

elijahbenizzy avatar Oct 29 '22 17:10 elijahbenizzy

Looks good-thank you!

On Sat, Oct 29, 2022 at 12:38 PM Elijah ben Izzy @.***> wrote:

Closed #25 https://github.com/stitchfix/hamilton/issues/25 as completed.

— Reply to this email directly, view it on GitHub https://github.com/stitchfix/hamilton/issues/25#event-7697445432, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB46N5KORTND63UWGPKAIVTWFVOKRANCNFSM5IEUDUOA . You are receiving this because you were mentioned.Message ID: @.***>

-- Regards,

Lorre Atlan, PhD

latlan1 avatar Oct 30 '22 17:10 latlan1