Mutual Information

In the Theory of probability and the Information theory, the mutual information of two random variable is a quantity measuring the statistical dependence of these variables. It is often measured in bit.

The mutual information of a couple (X, Y) of variables represents their degree of dependence to the probabilistic direction. This concept of logical dependence should not be confused with that of physical causality, although in practice one often implies the other.

Informellement, one says that two variables are independent if the realization of the one does not bring any information on the realization of the other. The Corrélation is a particular case of dependence in which the relation between the two variables is strictly Monotone.

Mutual information is null if the variables are independent, and believes when the dependence increases.

Definition

That is to say (X, Y) a couple of random variables of density of united probability given by P (X, there) . Let us note the marginal distributions P (X) and P (there) . Then mutual information is in the discrete case:

I (X, Y) = \ sum_ {X, there} P (X, there) \ times \ log \ frac {P (X, there)}{P (X) \, P (there)}, \!

and, in the continuous case:

I (X, Y) = \ int_ {(- \ infty, \ infty) \ times (- \ infty, \ infty)} P (X, there) \ times \ log \ frac {P (X, there)}{P (X) \, P (there)} \; dx Dy. \!

Properties

  • I ( X , Y ) = 0 if X and Y is independent random variables.
  • mutual information is positive or null.
  • mutual information is symmetrical.
  • “Dated processing theorem”: if g1 and g2 are two measurable functions then I (g1 (X), g2 (Y)) \ I (X, Y) . This means that no tranformation on the raw data can reveal information.

Several generalizations of this quantity to a larger number of variables were proposed, but no consensus still emerged.

Bonds with the information theory

Entropy

Mutual information measures the quantity of information brought on average by a realization of X on the probabilities of realization of Y . By considering that a probability distribution represents our knowledge on a random phenomenon, one measures the absence of information by the Entropie of this distribution. In these terms, mutual information is expressed by:

I (X, Y) = H (X) - H (X|Y) = H (Y) - H (Y|X) = H (X) + H (Y) - H (X, Y).

where H ( X ) and H ( Y ) is entropies, H ( X | Y ) and H ( Y | X ) is conditional entropies, and H ( Y , X ) is the joint Entropie between X and Y .

Thus it is seen that I (X, Y) =0 if the number of bits necessary to code a realization of the couple is equal to the sum of the number of bits to code a realization of X and number of bits to code and a realization of Y .

Divergence of Kullback-Leibler

Mutual information can also be expressed by the Divergence of Kullback-Leibler. One has

I (X, Y) = \ mathit {KL} (P (X, Y), P (X) P (Y))= \ sum P (X, Y) \ log \ frac {P (X, Y)}{P (X) P (Y)}.

Thus I (X, Y) measurement a kind of " distance" between the distributions P (X, Y) and P (X) * P (Y) . Like, by definition, two variables are independent if these two distributions are equal, and like \ mathit {KL} (p, Q) = 0 if p=q, one finds equivalence between I (X, Y) =0 and independence.

Intuitively P (X, Y) carries more information when the variables are dependant that when they are not it. If the two variables are discrete with NR case, one needs, in the worst case, N^2-1 coefficients to specify P (X, Y) , against only 2 NR - 1 if P (X, Y) =P (X) P (Y) .

The divergence \ mathit {KL} gives the number of bits of information brought by the knowledge of P (X, Y) when one knows already P (X) et P (Y) .

Random links:List Swadesh of the Malayan one | Patrocle of Arles | Apodère of the hazel tree | EL-OF-Haus | Planspitze | Ranchitos_Las_Lomas,_le_Texas