 # What is Entropy and why Information gain matter in Decision Trees?

## What is Information gain and why it is matter in Decision Tree?

Definition: Information gain (IG) measures how much ?information? a feature gives us about the class.

## Why it matter ?

• Information gain is the main key that is used by Decision Tree Algorithms to construct a Decision Tree.
• Decision Trees algorithm will always tries to maximize Information gain.
• An attribute with highest Information gain will tested/split first.

The Equation of Information gain:

## To understand Entropy and Information gain, lets draw a simple table with some features and labels.

Here in this table,

• Grade, Bumpiness and Speed Limit are the features and Speed is label.
• Total four observation.

First, lets work with Grade feature

In the Grade column there are four values and correspond that values there are four labels.

Lets consider all the labels as a parent node.

SSFF => parent node

So, what is the entropy of this parent node ?

Lets find out,

firstly we need to find out the fraction of examples that are present in the parent node. There are 2 types(slow and fast) of example present in the parent node, and parent node contains total 4 examples.

1. P(slow) => fraction of slow examples in parent node2. P(fast) => fraction of fast examples in parent node

lets find out P(slow),

p(slow) = no. of slow examples in parent node / total number of examples

Similarly the fraction of fast examples P(fast) will be,

So, the entropy of parent node:

Entropy(parent) = – {0.5 log2(0.5) + 0.5 log2(0.5)} = – {-0.5 + (-0.5)} = 1

So the entropy of parent node is 1.

Now, lets explore how a Decision Tree Algorithm construct a Decision Tree based on Information gain

First lets check whether the parent node split by Grade or not.

If the Information gain from Grade feature is greater than all other features then the parent node can be split by Grade .

To find out Information gain of Grade feature, we need to virtually split the parent node by Grade feature.

Entropy(children) with weighted avg. is = 0.675

So,

Information gain(Grade) = 1 – 0.675 = 0.325

Information gain from Grade feature is 0.325 .

Decision Tree Algorithm choose the highest Information gain to split/construct a Decision Tree. So we need to check all the feature in order to split the Tree.

Information gain from Bumpiness

The entropy of left and right child nodes are same because they contains same classes.

entropy(bumpy) and entropy(smooth) both equals to 1.

So, entropy (children) with weighted avg. for Bumpiness:

[weighted avg.]entropy(children) = 2/4 * 1 + 2/4 * 1 = 1

Hence,

Information gain(Bumpiness) = 1 – 1 = 0

Till now we have to Information gain:

Information gain from SpeedLimit

• What is a Flash loan?
• 3Commas Review | Pionex Review | Coinrule review
• AAX Exchange Review | Deribit Review |FTX Crypto Exchange Review
• NGRAVE ZERO review
• Bybit Exchange Review | Bityard Review | CoinSpot Review
• 3Commas vs Cryptohopper
• The Best Bitcoin Hardware wallet | BitBox02 Review
• Ledger vs Ngrave | ledger nano s vs x
• Vauld Review | YouHodler Review | BlockFi Review
• The Best Crypto Tax Software | CoinTracking Review
• Best Crypto Lending Platforms
• Ledger Nano S vs Trezor one vs Trezor T vs Ledger Nano X
• BlockFi vs Celsius | Hodlnaut Review
• Bitsgap review | Quadency Review
• Ellipal Titan Review | SecuX Stone Review
• DEX Explorer | Blockchain APIs | LocalBitcoins Review
• Best Blockchain Analysis Tools
• Crypto arbitrage guide: How to make money as a beginner
• Best Crypto Charting Tool | Best Crypto Exchange
• What are the best books to learn about Bitcoin?

Get Best Software Deals Directly In Your Inbox