Information gain (decision tree)

In the context of decision trees in information theory and machine learning, information gain refers to the conditional expected value of the Kullback–Leibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one. (In broader contexts, information gain can also be used as a synonym for either Kullback–Leibler divergence or mutual information, but the focus of this article is on the more narrow meaning below.) Explicitly, the information gain of a random variable X {\displaystyle X} obtained from an observation of a random variable A {\displaystyle A} taking value a {\displaystyle a} is defined as: I G ( X , a ) = D KL ( P X ∣ a ∥ P X ) {\displaystyle {\mathit {IG}}(X,a)=D_{\text{KL}}{\bigl (}P_{X\mid a}\parallel P_{X}{\bigr )}} In other words, it is the Kullback–Leibler divergence of P X ( x ) {\displaystyle P_{X}(x)} (the prior distribution for X {\displaystyle X} ) from P X ∣ a ( x ) {\displaystyle P_{X\mid a}(x)} (the posterior distribution for X {\displaystyle X} given A = a {\displaystyle A=a} ).

Source: Wikipedia — Information gain (decision tree) (CC BY-SA 4.0)

Information gain (decision tree)

In the context of decision trees in information theory and machine learning, information gain refers to the conditional expected value of the Kullback–Leibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one. (In broader contexts, information gain can also be used as a synonym for either Kullback–Leibler divergence or mutual information, but the focus of this article is on the more narrow meaning below.) Explicitly, the information gain of a random variable X {\displaystyle X} obtained from an observation of a random variable A {\displaystyle A} taking value a {\displaystyle a} is defined as: I G ( X , a ) = D KL ( P X ∣ a ∥ P X ) {\displaystyle {\mathit {IG}}(X,a)=D_{\text{KL}}{\bigl (}P_{X\mid a}\parallel P_{X}{\bigr )}} In other words, it is the Kullback–Leibler divergence of P X ( x ) {\displaystyle P_{X}(x)} (the prior distribution for X {\displaystyle X} ) from P X ∣ a ( x ) {\displaystyle P_{X\mid a}(x)} (the posterior distribution for X {\displaystyle X} given A = a {\displaystyle A=a} ).

Source: Wikipedia "Information gain (decision tree)" · CC BY-SA 4.0

Share this article: X · Bluesky
Privacy Policy