logitr icon indicating copy to clipboard operation
logitr copied to clipboard

Predicting out-of-sample tasks of known individuals - Mixed logit on panel data

Open armoutihansen opened this issue 2 years ago • 3 comments

Hi @jhelvy,

First of all, thanks a lot for this contribution. I am usually using python, but logitr managed to solve some convergence issues I was having with xlogit using panel data.

My current question/issue revolves around the following: I have estimated a mixed logit model on a panel of individuals in a set of tasks/problems. Now suppose I have a separate panel data set containing the same individuals on which I would like to make predictions. Using the (unconditional) estimated distribution over the parameters in order to make predictions is then not optimal, since we already have additional information on them from their prior choices. To be specific, let $g(\beta|\theta)$ be the population distribution of the parameters $\beta$, let $L(i,t|\beta)=\frac{e^{\beta'X_{it}}}{\sum_j e^{\beta'X_{it}}}$ be the probability of choosing $i$ in task $t$ conditional on $\beta$. Then, by Bayes' rule, the distribution over parameters conditional on having observed a sequence of choices $y$ is given by:

$$h(\beta|y,\theta)=\frac{P(y|\beta)g(\beta|\theta)}{P(y|\theta)}$$

Where $P(y|\beta)= L(y_1,1|\beta)\times\dots\times L(y_T,T|\beta)$ is the probability of the individual's sequence conditional on $\beta$ and $P(y|\theta)=\int P(Y|\beta)g(\beta|\theta)d\theta$ the unconditional probability. Based on this, an individual's estimated probability of choosing $i$ in out-of-sample task $T+1$ is given by:

$$\tilde{P}(i, T+1|y,\theta)=\frac{\sum_{r}L(i, T+1|\beta^r)P(y|\beta^r)}{\sum_{r}P(y|\beta^r)}$$

I should note that the above notation is from Revelt & Train (2000): "Customer-Specific Taste Parameters and Mixed Logit: Households' Choice of Electricity Supplier."

From my (limited) understanding of R, your predict method uses the population distribution over parameters to make predictions and does not allow for a panelID option to use the conditional distribution, is that correct? If so, do you know of any way I could use logitr to (1) derive the conditional distribution for each individual, and (2) make predictions based on this conditional distribution?

On a unrelated note, I think that I have spotted to bugs:

  1. If I estimate a multinomial logit using a single parameter, I get the following error when executing the summary method if I specify a clusterID:
image

Note that it works for two or more parameters. Furthermore, the summary method also work for a single parameter if I leave out clusterID.

  1. If I estimate a mixed logit using a single parameter, I get the following error in the estimation if I specify clusterID:
image

Note that the estimation works for two or more parameters. The estimation also works for a single parameter if I leave out clusterID.

Many thanks in advance for your time.

armoutihansen avatar Aug 10 '22 11:08 armoutihansen

You are correct that the logitr predict method uses the population distribution over parameters to make predictions. I believe what you described would be an appropriate way to make predictions using the panelID. Implementing this in the package would take some time to work out, and right now I cannot make it a priority. If you come up with a way to implement it even for a specific example, that would help me have something to work with and I could perhaps integrate it into the package.

As for the bugs, can you try installing the latest development version (0.7.2)?

# install.packages("remotes")
remotes::install_github("jhelvy/logitr")

I fixed several bugs regarding the clusterID recently, but I haven't gotten those fixes on CRAN yet.

jhelvy avatar Aug 10 '22 12:08 jhelvy

Thanks a lot for the quick response! I'll try to come up with something and get back to you.

I just installed the latest version (0.7.2) and it seems that the mentioned bugs persist in the newest version when only using one parameter. For instance by using the yogurt data with price being the only parameter.

armoutihansen avatar Aug 10 '22 18:08 armoutihansen

Ah okay so that's probably another bug. I may have actually introduced it when I made the fixes to the other ones. Would you mind opening up a separate issue about this to keep it separate from the prediction feature?

jhelvy avatar Aug 11 '22 09:08 jhelvy