xlogit icon indicating copy to clipboard operation
xlogit copied to clipboard

inconsistent alts values in long format

Open novak opened this issue 1 year ago • 3 comments

I am coming over from R and the mlogit package and I have the data formatted in the same way. It seems like xlogit is expecting that when in long format there's the same number of alt rows for each id group. Is this expected behavior for xlogit? Am I attempting to do something that just isn't possible?

model = MultinomialLogit()
model.fit(X=df[vars], y=df['result'], varnames=vars, ids=df['id'], alts=df['alt_id'])
model.summary()
id alt_id var1 var2 var3 var4 result
1 1 3 4 5 6 0
1 2 3 4 5 6 0
1 3 3 4 5 6 1
1 4 3 4 5 6 0
2 1 3 4 5 6 0
2 2 3 4 5 6 1
2 3 3 4 5 6 0
3 1 3 4 5 6 0
3 2 3 4 5 6 0
3 3 3 4 5 6 0
3 4 3 4 5 6 0
3 5 3 4 5 6 1

novak avatar Aug 11 '23 12:08 novak

UPDATE: My original comment had a mistake, as I mentioned the alts instead of the avail parameter to control for the availability of alternatives.

Hello @novak ,

Yes, in order to optimize matrix products, xlogit expects the data to be "balanced" across alternatives, which means that your data must have the same number of alternatives per choice situation. To address this issue in your sample data, you can fill the non-existing alternatives with zeros and use create a new 'avail' column to tell xlogit the availability of those alternatives. In other words, the avail column tells xlogit to ignore the alternatives you filled out with zeros. Use the avail parameter in the fit function, as illustrated below:

id alt_id var1 var2 var3 var4 result avail
1 1 3 4 5 6 0 1
1 2 3 4 5 6 0 1
1 3 3 4 5 6 1 1
1 4 3 4 5 6 0 1
1 5 0 0 0 0 0 0
2 1 3 4 5 6 0 1
2 2 3 4 5 6 1 1
2 3 3 4 5 6 0 1
2 4 0 0 0 0 0 0
2 5 0 0 0 0 0 0
3 1 3 4 5 6 0 1
3 2 3 4 5 6 0 1
3 3 3 4 5 6 0 1
3 4 3 4 5 6 0 1
3 5 3 4 5 6 1 1

Then use it in xlogit as follows

model.fit(..., avail=df['avail'], ...)

arteagac avatar Aug 14 '23 16:08 arteagac

Thank you for taking the time to provide a detailed response. I really appreciate it. My id and alt_id parameters would align with the id and panels arguments correct?

novak avatar Aug 15 '23 13:08 novak

Dear @novak. I am so sorry, I just realized that my original comment had a mistake. I updated the comment to properly convey the right way to account for availability of alternatives using the avail parameter (instead of the alts parameter I had initially mentioned). In summary, to address your issue of unbalanced alternatives you simply need to fill out non-existing alternatives with zeros and use the avail parameter to tell xlogit those alternatives do not exist. Please see the full source code below for your sample data:

model.fit(X=df[["var1", "var2", "var3", "var4"]],
          y=df["result"],
          ids=df["id"],
          alts=df["alt_id"],
          avail=df['avail'])

Note that the id and alt_id column need to be passed to the ids and alts parameters, respectively. You don't need to involve the panels parameter, as your data does not seem to have a panel structure.

arteagac avatar Aug 15 '23 17:08 arteagac