Logit regression

If one assumes that the probability \( P(i \to j) \) that actor \( i = 1..n \) chooses alternative \( j = 1..k \) is proportional (within the set of alternative choices) to the exponent of a linear combination of \( p = 1..\#p \) data values \( X_{ijp} \) related to i and j, one arrives at the logit model, or more formally:

Assume $$ \begin{align} P(i \to j) &\sim w_{ij} \\ w_{ij} &:=exp(v_{ij}) \\ v_{ij} &:= \sum\limits_{p} \beta_p X_{ijp} \end{align} $$

Thus \( L(i \to j) := log( P(i \to j)) \sim v_{ij} \).

Consequently, \( w_{ij} > 0 \) and \( P(i \to j) := { w_{ij} \over \sum\limits_{j'}w_{ij'}} \), since \( \sum\limits_{j}P_{ij} \) must be 1.

Note that:
 * \( v_{ij} \) is a linear combination of \( X_{ijp} \) with weights \( \beta_p \) as logit model parameters.
 * the odds ratio \( P(i \to j) \over  P(i \to j') \) of choice \( j \) against alternative \( j' \) is equal to \( {w_{ij} \over w_{ij'}} = exp( v_{ij} - v_{ij'} ) = exp \sum\limits_{p} \beta_p \left( X_{ijp}- X_{ij'p} \right) \)
 * this formulation does not require a separate beta index (aka parameter space dimension) per alternative choice j for each exogenous variable.

= Observed Data =

Observed choices \( Y_{ij} \) are assumed to be drawn from a repreated Bernoulli experiment with probabilites \( P(i \to j) \).

Thus \( P(Y) = \prod\limits_{ij} {N_i ! \times P(i \to j)^{Y_{ij}} \over Y_{ij}! } \) with \( N_i := \sum\limits_{j} Y_{ij} \).

Thus \( L(Y) := log( P(Y) ) \) $$\begin{align} &= log \prod\limits_{ij} {N_i ! \times P(i \to j)^{Y_{ij}} \over Y_{ij}! } \\ &= C + \sum\limits_{ij} (Y_{ij} \times log(P_{ij})) \\ &= C + \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times L(i \to j)}\right] \\ &= C + \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times \left( v_{ij} - log \sum\limits_{j'}w_{ij'}\right)}\right] \\ &= C + \sum\limits_{i} \left[{ \left( \sum\limits_{j}Y_{ij} \times v_{ij} \right) - N_i \times log \sum\limits_{j}w_{ij}}\right] \end{align} $$ with \( C = \sum\limits_{i} C_i \) and \( C_i := [log (N_i!) - \sum\limits_{j} log (Y_{ij}!)] \), which is independent of \( P_{ij} \) and \( \beta_{j} \). Note that: \( N_i = 1 \implies C_i = 0 \)

= Specification =

The presented form \( v_{ij} := \beta_{p} * X^p_{ij} \) (using Einstein notation from here) is more generic than known implementations of logistic regression (such as in SPSS and R), where \( X^q_i \), a set of \( q = 1.. \#q\) data values given for each \( i \) (\( X^0_i \) is set t 1 to represent the incident for each j) and \( (k-1)*(\#q+1) \) parameters are to be estimated, thus \( v_{ij} := \beta_{jq} * X^q_i \) for \( j = 2..k \) which requires a different beta for each alternative choice and data set, causing unnecessary large parameter space.

The latter specification can be reduced to the more generic form by:
 * assigning a unique p to each jq combination, represented by \( A^{p}_{jq} \).
 * defining \( X^{p}_{ij} := A^{p}_{jq} * X^q_i \) for \( j=2..k \), thus creating redundant and zero data values.

However, a generical model cannot be reduced to a specification with different \( \beta \)'s for each alternative choice unless the latter parameter space can be restricted to contain no more dimensions than a generic form.

With large n and k, the set of data values X_{ijk} can be huge. To mitigate the data size, the following tricks can be applied:
 * limit the set of combinations of \(i\) and \(j\) to the most probable or near \(j\)'s for each \(i\) and/or cluster the other \(j\)'s.
 * use only a sample from the set of possible \(i\)'s.
 * support specific forms of data:

= Regression =

The \( \beta_p \)'s are found by maximizing the likelihood \( L( Y|\beta ) \) which is equivalent to finding the maximum of \( \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times v_{ij} - N_i \times log \sum\limits_{j}w_{ij}}\right] \)

First order conditions, for each \( p \): \( 0 = { \partial L \over \partial\beta_p } = \sum\limits_{i} \left[{ \sum\limits_{j}Y_{ij} \times { \partial v_{ij} \over \partial \beta_p } - N_i \times { \partial log \sum\limits_{j}w_{ij} \over \partial \beta_p }} \right]\)

Thus, for each \( p\): \( \sum\limits_{ij} Y_{ij} \times X_{ijp} = \sum\limits_{ij} N_i \times P_{ij} \times X_{ijp} \) as \( { \partial v_{ij} \over \partial \beta_p } = X^p_{ij} \) and \( { \partial log \sum\limits_{j}w_{ij} \over \partial \beta_p } = { \sum\limits_{j} {\partial w_{ij} / \partial \beta_p } \over \sum\limits_{j}w_{ij} } = { \sum\limits_{j} {w_{ij} \times \partial v_{ij} / \partial \beta_p } \over \sum\limits_{j}w_{ij} } = { \sum\limits_{j} {w_{ij} \times X_{ijp} } \over \sum\limits_{j}w_{ij} } = \sum\limits_{j} P_{ij} \times X_{ijp} \)

= Examples = logit regression of rehousing.