expan icon indicating copy to clipboard operation
expan copied to clipboard

sga() and create_binning() have different default nbins

Open jbao opened this issue 9 years ago • 7 comments

Also, use different defaults for categorical (e.g. 4) and numerical (e.g. 8).

jbao avatar Feb 07 '17 16:02 jbao

Hi I can work on this. This is my first contribution. A little guidance would be helpful.

aravs16 avatar Feb 22 '17 01:02 aravs16

Hi @aravs16 , thanks for volunteering, this issue is mostly about syncing up the default arguments in

https://github.com/zalando/expan/blob/dev/expan/core/experiment.py#L715

and

https://github.com/zalando/expan/blob/dev/expan/core/binning.py#L732

Ideally we could somehow distinguish between categorical and numerical bins, but if that's not possible, we may also completely leave out the argument nbins, since a random number probably would not make sense. Let us know if anything is unclear;-)

jbao avatar Feb 22 '17 08:02 jbao

@jbao Thank you!

Can we make nbins = 8 in both places and then pass nbins//2 on this line: https://github.com/zalando/expan/blob/dev/expan/core/binning.py#L768 ?

aravs16 avatar Feb 23 '17 20:02 aravs16

hmm, I think we should make nbins transparent to the user, and not doing any post-processing within the function itself, what do you think?

jbao avatar Feb 24 '17 08:02 jbao

Hi @jbao, in my opinion, we should set nbins=None in definitions of sga() and creat_binning(). And add post-processing ≈ if nbins == None: nbins = 8 if is_numeric else 4 followed with a warning.

yagudin avatar Mar 17 '17 19:03 yagudin

Hi everyone,

I found using None as a flag for default (as opposite to undefined) value quite pervasive in ExpAn (for example, None in place of KPI names or feature names implies "do all"). Was it a deliberate design decision? What happens to those options where we want None to signalize the absence of information, rather than the intention to use default values?

gbordyugov avatar Mar 20 '17 08:03 gbordyugov

I think we should specify the default value explicitly in the argument if we have one (e.g. in the case of nbins), and use None to imply that the function will take all possible input values (e.g. in the case of kpi_subset). What do you think?

jbao avatar Mar 20 '17 10:03 jbao