xlogit
xlogit copied to clipboard
inconsistent alts values in long format
I am coming over from R and the mlogit package and I have the data formatted in the same way. It seems like xlogit is expecting that when in long format there's the same number of alt rows for each id group. Is this expected behavior for xlogit? Am I attempting to do something that just isn't possible?
model = MultinomialLogit()
model.fit(X=df[vars], y=df['result'], varnames=vars, ids=df['id'], alts=df['alt_id'])
model.summary()
id | alt_id | var1 | var2 | var3 | var4 | result |
---|---|---|---|---|---|---|
1 | 1 | 3 | 4 | 5 | 6 | 0 |
1 | 2 | 3 | 4 | 5 | 6 | 0 |
1 | 3 | 3 | 4 | 5 | 6 | 1 |
1 | 4 | 3 | 4 | 5 | 6 | 0 |
2 | 1 | 3 | 4 | 5 | 6 | 0 |
2 | 2 | 3 | 4 | 5 | 6 | 1 |
2 | 3 | 3 | 4 | 5 | 6 | 0 |
3 | 1 | 3 | 4 | 5 | 6 | 0 |
3 | 2 | 3 | 4 | 5 | 6 | 0 |
3 | 3 | 3 | 4 | 5 | 6 | 0 |
3 | 4 | 3 | 4 | 5 | 6 | 0 |
3 | 5 | 3 | 4 | 5 | 6 | 1 |
UPDATE: My original comment had a mistake, as I mentioned the alts
instead of the avail
parameter to control for the availability of alternatives.
Hello @novak ,
Yes, in order to optimize matrix products, xlogit
expects the data to be "balanced" across alternatives, which means that your data must have the same number of alternatives per choice situation. To address this issue in your sample data, you can fill the non-existing alternatives with zeros and use create a new 'avail' column to tell xlogit the availability of those alternatives. In other words, the avail
column tells xlogit to ignore the alternatives you filled out with zeros. Use the avail
parameter in the fit
function, as illustrated below:
id | alt_id | var1 | var2 | var3 | var4 | result | avail |
---|---|---|---|---|---|---|---|
1 | 1 | 3 | 4 | 5 | 6 | 0 | 1 |
1 | 2 | 3 | 4 | 5 | 6 | 0 | 1 |
1 | 3 | 3 | 4 | 5 | 6 | 1 | 1 |
1 | 4 | 3 | 4 | 5 | 6 | 0 | 1 |
1 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 3 | 4 | 5 | 6 | 0 | 1 |
2 | 2 | 3 | 4 | 5 | 6 | 1 | 1 |
2 | 3 | 3 | 4 | 5 | 6 | 0 | 1 |
2 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 1 | 3 | 4 | 5 | 6 | 0 | 1 |
3 | 2 | 3 | 4 | 5 | 6 | 0 | 1 |
3 | 3 | 3 | 4 | 5 | 6 | 0 | 1 |
3 | 4 | 3 | 4 | 5 | 6 | 0 | 1 |
3 | 5 | 3 | 4 | 5 | 6 | 1 | 1 |
Then use it in xlogit as follows
model.fit(..., avail=df['avail'], ...)
Thank you for taking the time to provide a detailed response. I really appreciate it. My id
and alt_id
parameters would align with the id
and panels
arguments correct?
Dear @novak. I am so sorry, I just realized that my original comment had a mistake. I updated the comment to properly convey the right way to account for availability of alternatives using the avail
parameter (instead of the alts
parameter I had initially mentioned). In summary, to address your issue of unbalanced alternatives you simply need to fill out non-existing alternatives with zeros and use the avail
parameter to tell xlogit those alternatives do not exist. Please see the full source code below for your sample data:
model.fit(X=df[["var1", "var2", "var3", "var4"]],
y=df["result"],
ids=df["id"],
alts=df["alt_id"],
avail=df['avail'])
Note that the id
and alt_id
column need to be passed to the ids
and alts
parameters, respectively. You don't need to involve the panels
parameter, as your data does not seem to have a panel structure.