Information theory

The information theory is concerned with Information systems, Communication systems and of their Efficacité. The concept of Information system or communication being broad, it goes from there in the same way from the theory of the Information.

This field finds its origin scientific with Claude Shannon which is the founding father with his article has Mathematical Theory off Communications published in 1948.

Among the important branches, one can quote:

The force of this theory is not to seek to define the concept of information, just like the arithmetic one does not define what is a number.

The appearance of the information theory is related to the appearance of the cognitive Psychologie in the years 1940 - 1950.

History

The information theory results initially from work of Ronald Aylmer Fisher. This one, statistician, define formally the Information like equal to the median value of the square of derived from the logarithm of the law of studied probability. Starting from the inequality of Cramer, the value of such a information is proportional to the low variability of the resulting conclusions. In simple terms, less one observation is probable, more its observation is carrying information. For example, when the journalist begins the tv news with the sentence " Bonsoir" , this word, which presents a strong probability, brings only little of information. On the other hand, if the first sentence is, for example " France has peur" , its weak probability will make that the listener will learn that it occurred something, and, therefore, will be more with listening.

Other mathematical models supplemented and extended in a formal way the definition of the Information.

Claude Shannon and Warren Weaver reinforces the paradigm. They are engineers in telecommunication and are concerned with measure information to deduce the fundamental ones from the Communication (and not an information theory). In Mathematical Theory of the Communication in 1948, they model information to study the corresponding laws: Noise, Entropy and chaos, by analogy general with the laws of energetics and thermodynamics. Their work supplementing those of Alan Turing, Norbert Wiener and John von Neumann (to quote only the principal ones) constitutes the initial base of the theory of the signal and the “Information sciences”.

For a source X comprising N symbols, a symbol I having a probability p_i of appearing, the Entropie H of source X is defined like:

H (X) = \ sum_i^n p_i log_2 p_i

It is at the beginning the Napierian logarithm which is used. One will replace it for convenience by the logarithm with base 2, corresponding to information which is the Bit. The considerations of maximum Entropie (MAXENT) will make it possible the Inférence bayésienne to in a rational way define its distributions A priori.

The Informatique will constitute a technical variation automating the treatments (of which transmission and the transport) of information. Name “Technologies Information and the Communication” in the broad sense recovers various aspects (systems of treatments, networks, etc) of data processing.

The Information sciences release from the direction since Donnée S while being based on questions of correlation, Entropie and Apprentissage (see Data mining). The Technologies the information, as for them, deal with the way of conceiving, of implementing and of deploying solutions to meet identified needs.

Adrian Mc Donough in Information economics defines information as the meeting of a data (dated) and a problem. Knowledge (Knowledge) is potential information. The informational output of a system of data processing is the quotient between the number of bits of the tank of data and that of extracted information. Are the cost side system, information, the been worth dated side . It results from it that when a data processing specialist calculates the productivity of his system by the relationship between the quantity of produced data and the financial costs, it makes an error, because the two terms of the equation neglect the quantity of really produced information. This remark takes all its direction in the light of the great principle of Russel Ackoff which postulates that beyond a certain mass of data, the quantity of information drops and that in extreme cases it becomes null. This corresponds to the proverb " too much information destroys the information". This report is worsened when the receiver of the system is a human processor, and worse still, the conscious one of a human agent. Indeed, information is dependant on the selection operated by the attention, and the intervention of data emotional, emotional, and structural absent from the computer. Information is transformed then into direction, then in motivation. Information which does not produce any direction is null and not avenue for the human receiver, even if it is acceptable for a robot. Information in charge of direction but not irrigated by a psychological energy (drive, cathexis, libido, ep, etc) died. It is thus noted that in the chain which leads data to the action: - data - information - knowledge - sins - motivation, only the first two transformations are taken into account by the traditional information theory and semiology. Kevin Bronstein notices that the automat defines information only by two values: the number of bits, the structure and the organization of sow, whereas psychism utilized dynamic factors such as passion, motivation, desire, repulsion etc which give life to psychological information .

Examples of information

A Information indicates, among a whole of events, one or more possible events.

In theory, the Information decreases uncertainty. In Decision theory, one even considers that one should call information only what is likely to have an effect on our decisions (few things in a newspaper are on this account of information…).

In practice, the excess of information, such as it arises in the systems of Email, can lead to a saturation, and prevent the catch of Décision.

First example

That is to say a source being able to produce whole tensions from 1 to 10 volts and a receiver which will measure this tension. Before the sending of the electric current by the source, the receiver does not have any idea of the tension which will be delivered by the source. On the other hand, once the current emitted and delivered, uncertainty on the emitted current decreases. The information theory considers that the receiver has an uncertainty of 10 states .

Second example

A library has a great number of works, reviews, books and dictionaries. We seek run complet on the information theory. First of all, it is logical that we will not find this file in works of art or of literature; we thus have just obtained information which will decrease our search time. We had specified that we also want run complet, we will thus find it neither in a review, nor in a dictionary. we obtained extra information (we seek a book), who will still reduce the time of our research.

Imperfect information

That is to say a realizer of which I like two films out of three. A critic that I know well éreinte his last film and I know that I divide on average analyze of this critic four times out of five. Will this critic dissuade me to go to see film? It is there the key question of the Inférence bayésienne, which is also quantified in bits .

Contents of information and context

is needed less bits to write dog that mammalian . However the indication Médor is a dog contains well more of information which the indication Médor is a mammal : the semantic contents of information of a message depends on the context . In fact, it is the couple message + context which constitutes the genuine carrier of information , and never the message alone (see Paradoxe of the compressor).

Measure quantity of information

Quantity of information: elementary case

Let us consider NR boxes numbered of 1 to NR. an individual randomly hid an object in one of these boxes. An individual B must find the number of the box where the object is hidden. For that, it has the right to raise questions with the individual has to which this one must answer without lying by YES or not. But each put question represents a cost to be paid by the individual B (for example one euro). An individual C knows in which box the object is hidden. He with the possibility of selling this information with the individual B.B will accept this market only if the price of C is lower or equal to the average costs that B should spend to thus find the box by raising questions with A. the information held by a.c. a certain price. This price represents the quantity of information represented by the knowledge of the good box: it is the median number of questions to pose to identify this box. We will note it I.

EXAMPLE:

If NR = 1, I = 0. There is only one box. No question is necessary.

If NR = 2, I = 1. It is asked whether the good box is the box n°1. The answer YES or not determines then without ambiguity which is the sought box.

If NR = 4, I = 2. It is asked whether the box carries the n°1 or 2. The answer then makes it possible to eliminate two from the boxes and it is enough to a last question to find which is the good box among the two remaining ones.

If NR = 2 K , I = K . One writes the numbers of the boxes bases 2 of them. The numbers have with the more K binary digits, and for each one of row of these figures, one asks whether the sought box has figure 0 or 1 quantifies it. In K questions, one determined all the binary digits of the good box. That also amounts posing K questions, each question having for goal successively to divide the number of boxes considered by 2 (method of dichotomy).

One is thus brought to pose I = log_2 (NR) , but this configuration occurs only in the case of NR equiprobable events.

Quantity of relative information to event

Let us suppose now that the boxes are coloured, and that there is red N boxes. Also let us suppose that C knows that the box where the object is hidden is red. Which is the price of this information? Without this information, the price to be paid is log (NR). Armed with this information, the price to be paid is nothing any more but log ( N ). The price of information “ the sought box is red ” is thus log (NR) - log ( N ) = log N N .

One thus defines the quantity of information like an increasing function of \ frac {NR} {N} with:

  • N the number of possible events
  • n the cardinal of the Subset delimited by information

In order to measure this quantity of information, one poses: I = log_ {2} \ left (\ frac {NR} {N} \ right)

I is expressed in bit (or logon , unit introduced by Shannon, whose, in the facts, bit became a synonym), or in nat if one uses the natural Logarithme in the place of the Logarithme of bases 2.

This definition is justified, because one wants the properties following:

  1. information lies between 0 and ∞;
  2. an event with little probability represents much information (example: “It snows in January” contains much less information than “It snows in August” for little than one is in the northern hemisphere);
  3. information must be additive.

Remarque: when one has several information, the quantity of total information is not the sum of the quantities of information. This is due to the presence of the logarithm. See also: mutual Information, information common to two messages, which, in the idea, explains this “under-additivity” of information.

Entropy, formula of Shannon

Let us suppose now that the boxes are various colors: N 1 boxes of paints C1, N 2 boxes of paints C2,…, N K boxes of paints C K , with N 1 + N 2 +… + N K = NR. the person C knows of which color is the required box. Which is the price of this information?

Information “ the box is of color C1 ” is worth log N N 1, and this possibility has a probability N 1/N. Information “ the box is of color C2 ” is worth log N N 2, and this possibility has a probability N 2/N…

The average costs of information are thus N 1/N log N N 1 + N 2/N log N N 2 +… + N K /N log N N K . More generally, if one considers K events disjoined of respective probabilities p 1, p 2,…, p K with p 1 + p 2 +… + p K = 1, then the quantity of information corresponding to this probability distribution is p 1 log 1 p 1 +… + p K log 1 p K . This quantity is called entropy of the probability distribution.

The Entropie thus makes it possible to measure the quantity of average information of a whole of events (in particular of messages) and to measure its uncertainty. It is noted H:

H \ left (I \ right) = - \ sum_ {I \ in I} p_i \ mathbf {log} _2 \; p_i
with p_i = \ frac {n_i} {NR} probability associated with the appearance of the event i.

See the detailed article: Entropy of Shannon.

Coding of information

A succession of symbols is considered. Each symbol can take two values S 1 and S 2 with probabilities respectively p 1 = 0,8 and p 2 = 0,2. The quantity of contained information in symbol is p 1 log 1 p 1 + p 2 log 1 p 2 ≈ 0,7219. If each symbol is independent of the following, then a message of NR symbols contains on average a quantity of information equalizes with 0,72N. If the symbol S 1 is coded 0 and the symbol S 2 is coded 1, then the message has a length of NR, which is a loss compared to the quantity of information that it carries. The theorems of Shannon state that it is impossible to find a code of which the average length is lower than 0,72N, but that it is possible to code the message so that the coded message has on average a length as close as one wants 0,72N when NR increases.

For example, one gathers the symbols three by three and one codes them as follows:

The message s1s1s1s1s1s2s2s2s1 will be coded 010011110.

The average length of the code of a message of NR symbols is: {NR \ over 3} (0.512 + 3 \ times 0.128 \ times 3 + 3 \ times 0.032 \ times 5 + 0.008 \ times 5) = 0,728N

See the detailed article: Theory of the codes.

See too

Random links:Lio | Grand Prix motor bike of France | Robert Maxwell | Dicamptodon ensatus | Sunshine Coast