# Logit regression

If one assumes that the probability $$P(i \to j)$$ that actor $$i = 1..n$$ chooses alternative $$j = 1..k$$ is proportional (within the set of alternative choices) to the exponent of a linear combination of $$p = 1..\#p$$ data values $$X_{ijp}$$ related to i and j, one arrives at the logit model, or more formally:

Assume \begin{align} P(i \to j) &\sim w_{ij} \\ w_{ij} &:=exp(v_{ij}) \\ v_{ij} &:= \sum\limits_{p} \beta_p X_{ijp} \end{align}

Thus $$L(i \to j) := log( P(i \to j)) \sim v_{ij}$$.

Consequently, $$w_{ij} > 0$$ and $$P(i \to j) := { w_{ij} \over \sum\limits_{j'}w_{ij'}}$$, since $$\sum\limits_{j}P_{ij}$$ must be 1.

Note that:

• $$v_{ij}$$ is a linear combination of $$X_{ijp}$$ with weights $$\beta_p$$ as logit model parameters.
• the odds ratio $$P(i \to j) \over P(i \to j')$$ of choice $$j$$ against alternative $$j'$$ is equal to $${w_{ij} \over w_{ij'}} = exp( v_{ij} - v_{ij'} ) = exp \sum\limits_{p} \beta_p \left( X_{ijp}- X_{ij'p} \right)$$
• this formulation does not require a separate beta index (aka parameter space dimension) per alternative choice j for each exogenous variable.

# Observed Data

Observed choices $$Y_{ij}$$ are assumed to be drawn from a repreated Bernoulli experiment with probabilites $$P(i \to j)$$.

Thus $$P(Y) = \prod\limits_{ij} {N_i ! \times P(i \to j)^{Y_{ij}} \over Y_{ij}! }$$ with $$N_i := \sum\limits_{j} Y_{ij}$$.

Thus $$L(Y) := log( P(Y) )$$ \begin{align} &= log \prod\limits_{ij} {N_i ! \times P(i \to j)^{Y_{ij}} \over Y_{ij}! } \\ &= C + \sum\limits_{ij} (Y_{ij} \times log(P_{ij})) \\ &= C + \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times L(i \to j)}\right] \\ &= C + \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times \left( v_{ij} - log \sum\limits_{j'}w_{ij'}\right)}\right] \\ &= C + \sum\limits_{i} \left[{ \left( \sum\limits_{j}Y_{ij} \times v_{ij} \right) - N_i \times log \sum\limits_{j}w_{ij}}\right] \end{align} with $$C = \sum\limits_{i} C_i$$ and $$C_i := [log (N_i!) - \sum\limits_{j} log (Y_{ij}!)]$$, which is independent of $$P_{ij}$$ and $$\beta_{j}$$. Note that: $$N_i = 1 \implies C_i = 0$$

# Specification

The presented form $$v_{ij} := \beta_{p} * X^p_{ij}$$ (using Einstein notation from here) is more generic than known implementations of logistic regression (such as in SPSS and R), where $$X^q_i$$, a set of $$q = 1.. \#q$$ data values given for each $$i$$ ($$X^0_i$$ is set t 1 to represent the incident for each j) and $$(k-1)*(\#q+1)$$ parameters are to be estimated, thus $$v_{ij} := \beta_{jq} * X^q_i$$ for $$j = 2..k$$ which requires a different beta for each alternative choice and data set, causing unnecessary large parameter space.

The latter specification can be reduced to the more generic form by:

• assigning a unique p to each jq combination, represented by $$A^{p}_{jq}$$.
• defining $$X^{p}_{ij} := A^{p}_{jq} * X^q_i$$ for $$j=2..k$$, thus creating redundant and zero data values.

However, a generical model cannot be reduced to a specification with different $$\beta$$'s for each alternative choice unless the latter parameter space can be restricted to contain no more dimensions than a generic form.

With large n and k, the set of data values X_{ijk} can be huge. To mitigate the data size, the following tricks can be applied:

• limit the set of combinations of $$i$$ and $$j$$ to the most probable or near $$j$$'s for each $$i$$ and/or cluster the other $$j$$'s.
• use only a sample from the set of possible $$i$$'s.
• support specific forms of data:
# form reduction description
0$$\beta_{p} X^p_{ij}$$ general form of p factors specific for each $$i$$ and $$j$$
1$$\beta_{p} A^{p}_{jq} X^q_i$$ $$X^{p}_{ij} := A^{p}_{jq} X^q_i$$ q factors that vary with $$i$$ but not with $$j$$.
2$$\beta_{p} X^p_i X^p_j$$ $$X^p_{ij} := X^p_j X^p_i$$ p specific factors in simple multiplicative form
3$$\beta_{jq} X^q_i$$ q factors that vary with $$j$$ but not with $$i$$.
4$$\beta_{p} X^p_j$$ $$X^p_{ij} := X^p_j$$ state constants $$D_j$$
5$$\beta_j$$ state dependent intercept
6$$\beta_p (J^p_i==j )$$ usage of a recorded preference

# Regression

The $$\beta_p$$'s are found by maximizing the likelihood $$L( Y|\beta )$$ which is equivalent to finding the maximum of $$\sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times v_{ij} - N_i \times log \sum\limits_{j}w_{ij}}\right]$$

First order conditions, for each $$p$$: $$0 = { \partial L \over \partial\beta_p } = \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times { \partial v_{ij} \over \partial \beta_p } - N_i \times { \partial log \sum\limits_{j}w_{ij} \over \partial \beta_p }} \right]$$

Thus, for each $$p$$: $$\sum\limits_{ij} Y_{ij} \times X_{ijp} = \sum\limits_{ij} N_i \times P_{ij} \times X_{ijp}$$ as $${ \partial v_{ij} \over \partial \beta_p } = X^p_{ij}$$ and $${ \partial log \sum\limits_{j}w_{ij} \over \partial \beta_p } = { \sum\limits_{j} {\partial w_{ij} / \partial \beta_p } \over \sum\limits_{j}w_{ij} } = { \sum\limits_{j} {w_{ij} \times \partial v_{ij} / \partial \beta_p } \over \sum\limits_{j}w_{ij} } = { \sum\limits_{j} {w_{ij} \times X_{ijp} } \over \sum\limits_{j}w_{ij} } = \sum\limits_{j} P_{ij} \times X_{ijp}$$