4. How to calculate entropy and information gain ?
A dataset contain data in table form (rows or column). Each column is attribute except last column is class (classification variable). Entropy is the measure of randomness or different categorical values in each attribute. Information Gain is the change or difference of entropy information.
How to calculate entropy and information gain, first you have calculate last column (classification variable) entropy. First check total number of categorical values, in my case i have 2 values that is yes or no. As you see in picture above the dataset or formula is given. Total number of rows is 14 and total number of yes in last column is 9 or no is 5. Apply formula on both categorical values (yes, no) and sum the result, that is entropy weather dataset.
Now calculate entropy of each attribute, as you see in above picture it calculate entropy of outlook attribute then it calculate average entropy information for outlook. Information Gain of outlook is equal to difference of entropy of weather (dataset) and average entropy information of outlook. So first find entropy of weather (dataset) and average entropy information of attribute then you can find information gain.
After getting information gain of every attribute, how to create decision tree. First you have to know that
1. how many node will be in decision tree ?
2. what will be root of decision tree ?
3. how to add attribute to decision tree ?
The total number of nodes in decision tree is equal to total number of attribute – 1 ( skip the attribute with lowest information ). The attribute with highest information gain will be root of decision tree. You will add attributes in descending order of information gain and in this way, you will add attribute to decision tree.
As you can see above picture, outlook is has highest information gain of 0.247, so it set as root of tree, then outlook has 3 categorical values(sunny, overcast, rainy). The entropy of overcast is zero and only contain yes, so i set as yes. Second or third highest information gain is humidity with 0.152 and Windy with 0.048. I will skip temperature attribute because its last attribute with lowest information gain.
You can download source code in java with 3 dataset examples