# Hierarchical Clustering

## Hierarchical Clustering¶

Clustering can be used to group customers or markets based on similarities. Customer segmentation can be used to create an appropriate marketing strategy for that segment. In this blog, we will look at customer segmentation using beer data set.

Hierarchical clustering is a clustering algorithm which builds a hierarchy from the bottom-up. It uses the following steps to develop clusters:
2. Find the data points with the shortest distance (using an appropriate distance measure) and merge them to form a cluster.
3. Repeat step 2 until all data points are merged to form a single cluster.

## Beer data set¶

The beer data set contains 20 records of different type of beer brand and contains information about the calories, alcohol, sodium content and cost. It is taken from Machine Learning Using Python - Manaranjan Pradhan

Beer dataset
name calories sodium alcohol cost
Budweiser 144 15 4.7 0.43
Schlitz 151 19 4.9 0.43
Lowenbrau 157 15 0.9 0.48
Kronenbourg 170 7 5.2 0.73
Heineken 152 11 5.0 0.77
Old_Milwaukee 145 23 4.6 0.28
Augsberger 175 24 5.5 0.40
Srohs_Bohemian_Style 149 27 4.7 0.42
Miller_Lite 99 10 4.3 0.43
Budweiser_Light 113 8 3.7 0.40
Coors 140 18 4.6 0.44
Coors_Light 102 15 4.1 0.46
Michelob_Light 135 11 4.2 0.50
Becks 150 19 4.7 0.76
Kirin 149 6 5.0 0.79
Pabst_Extra_Light 68 15 2.3 0.38
Hamms 139 19 4.4 0.43
Heilemans_Old_Style 144 24 4.9 0.43
Olympia_Goled_Light 72 6 2.9 0.46
Schlitz_Light 97 7 4.2 0.47

## Find distances between all points¶

As the features are on different scales, they should be normalized. After normalizing, the distance between every pair of points is computed. The distance metric should be selected based on the type of features. In this particular case, euclidean distance gives better results as the variables are continuous. After normalizing, the distance between every pair of points is shown in a matrix below.

``````##                      Budweiser   Schlitz Lowenbrau Kronenbourg  Heineken
## Schlitz              0.6757423
## Lowenbrau            3.5360570 3.7478149
## Kronenbourg          2.5913185 2.8431126 4.5013847
## Heineken             2.4544248 2.6450186 4.3136034   0.9125425
## Old_Milwaukee        1.5998120 1.2477763 3.8868316   4.0677186 3.8672105
## Augsberger           1.8712535 1.2459169 4.5173402   3.4590922 3.3487149
## Srohs_Bohemian_Style 1.8321163 1.2330993 3.9706743   3.8087854 3.4400718
## Miller_Lite          1.7089224 2.2633337 3.7591742   3.2676911 3.0014940
## Budweiser_Light      1.7512699 2.3722713 3.1892420   3.2644236 3.1333976
## Coors                0.4883132 0.4856244 3.4879470   2.8437559 2.5716098
## Coors_Light          1.5068182 1.8897230 3.4596559   3.3190336 2.8912688
## Michelob_Light       0.9499789 1.5505681 3.1807422   2.2518883 2.0808508
## Becks                2.3660808 2.2857313 4.0446639   2.0037202 1.2501115
## Kirin                2.8547419 3.1765983 4.5521713   0.8422027 0.7785034
## Pabst_Extra_Light    3.3591412 3.7029356 3.2816965   5.0759627 4.6336682
## Hamms                0.6875341 0.6068278 3.3454136   3.0335154 2.7340487
## Heilemans_Old_Style  1.3798178 0.7941165 3.9612923   3.4313954 3.0804259
## Olympia_Goled_Light  3.2098342 3.7589113 3.6258537   4.2940397 3.9826335
## Schlitz_Light        2.0429773 2.6447024 3.8221315   3.1427848 2.9150574
##                      Old_Milwaukee Augsberger Srohs_Bohemian_Style Miller_Lite
## Schlitz
## Lowenbrau
## Kronenbourg
## Heineken
## Old_Milwaukee
## Augsberger               1.5411174
## Srohs_Bohemian_Style     1.1529724  1.2266573
## Miller_Lite              2.7124475  3.4760348            3.0884077
## Budweiser_Light          2.7716223  3.5832060            3.2575686   0.8081580
## Coors                    1.3507153  1.7109934            1.4092318   1.8415655
## Coors_Light              2.2910705  3.0835607            2.4725914   0.8146724
## Michelob_Light           2.4239167  2.7478844            2.5768934   1.2954515
## Becks                    3.3741567  2.8241049            2.6434210   3.1671867
## Kirin                    4.3840798  3.9594358            4.0965521   3.1121580
## Pabst_Extra_Light        3.5900677  4.7984133            3.9270245   2.2635759
## Hamms                    1.2307318  1.7480134            1.2913002   1.9034640
## Heilemans_Old_Style      1.0828051  1.1810662            0.5230776   2.6528073
## Olympia_Goled_Light      4.0581785  4.9931362            4.4113827   1.6920939
## Schlitz_Light            3.2059710  3.8688057            3.5374917   0.5448379
##                      Budweiser_Light     Coors Coors_Light Michelob_Light
## Schlitz
## Lowenbrau
## Kronenbourg
## Heineken
## Old_Milwaukee
## Augsberger
## Srohs_Bohemian_Style
## Miller_Lite
## Budweiser_Light
## Coors                      1.9657761
## Coors_Light                1.2529869 1.4186610
## Michelob_Light             1.1930283 1.2104952   1.2812243
## Becks                      3.3626474 2.2406464   2.7340113      2.2706129
## Kirin                      3.1908882 3.0636458   3.1863491      2.3107286
## Pabst_Extra_Light          2.2392841 3.2405907   2.0743545      3.0000793
## Hamms                      1.9968978 0.2504784   1.4075077      1.3275409
## Heilemans_Old_Style        2.8666790 0.9640584   2.0921695      2.1535201
## Olympia_Goled_Light        1.6240658 3.2905014   2.0169536      2.5316147
## Schlitz_Light              0.8642703 2.2333413   1.2321060      1.4095448
##                          Becks     Kirin Pabst_Extra_Light     Hamms
## Schlitz
## Lowenbrau
## Kronenbourg
## Heineken
## Old_Milwaukee
## Augsberger
## Srohs_Bohemian_Style
## Miller_Lite
## Budweiser_Light
## Coors
## Coors_Light
## Michelob_Light
## Becks
## Kirin                2.0054519
## Pabst_Extra_Light    4.4101287 4.8160486
## Hamms                2.3232863 3.2390072         3.1162778
## Heilemans_Old_Style  2.4165933 3.7003060         3.7415005 0.9031475
## Olympia_Goled_Light  4.1907285 3.9218100         1.5800965 3.2772677
## Schlitz_Light        3.2567746 2.8969220         2.4146863 2.3147609
##                      Heilemans_Old_Style Olympia_Goled_Light
## Schlitz
## Lowenbrau
## Kronenbourg
## Heineken
## Old_Milwaukee
## Augsberger
## Srohs_Bohemian_Style
## Miller_Lite
## Budweiser_Light
## Coors
## Coors_Light
## Michelob_Light
## Becks
## Kirin
## Pabst_Extra_Light
## Hamms
## Heilemans_Old_Style
## Olympia_Goled_Light            4.0688402
## Schlitz_Light                  3.0937449           1.4619234
``````

The minimum distance is between 17 and 11 which are Coors and Hamms. These two beers are combined into one cluster and the centroid of the cluster is considered as a point for the next step. The next two closest points/clusters are combined to form a bigger cluster and this continues till all the points are clustered into one big cluster.

## Dendrogram¶

Dendrogram is a pictorial representation of merging of various cases as the Euclidean distance is increased. The distance is rescaled to a scale between 0 and 4. By drawing a vertical line at different values of re-scaled distance, one can identify the clusters. The dendrogram for beer dataset is shown below.

From the above plot, we can observe that Coors and Hamms were the closest and thus were clustered first.
Then Srohs_bohemian_style and Heilemans_Old_Style were merged into one cluster
Subsequently, the centroid of the coors-hams cluster is close to Schlitz, so all the three beers were clustered And so on until all the beers are finally clustered into one cluster

From the above dendrogram, I want to segment customers for effective marketing strategy. How many clusters are ideal?

If I take a cut-off of distance 2.5 in the dendrogram, we have 4 clusters, but if I take a smaller 1.5 as cut-off, the number of clusters increases to 12. So 4 (or 5) clusters seems to be an appropriate number of clusters.

Let us look at each of the clusters

### Cluster 1¶

Cluster 1 contains Becks, Kronenbourg, Heineken and Kirin beers. They are imported brands into the US. They have high alcohol content, low sodium content and high costs. The target customers are brand sensitive, and the brands are promoted as premium brands.

### Cluster 2¶

Cluster 2 contains Budweiser, Schlitz, Coors, Hamms, Augsberger etc beers. They have medium alcohol content and medium cost. They are the largest segment of customers.

### Cluster 3¶

Cluster 3 contains light beers like Coors_light, Budwiser_light, Miller_lite etc. These are beers with low calorie, low sodium and low alcohol content. The target customers are the customer segment who want to drink but are also health conscious.

## References¶

1. Business Analytics: The Science of Data-Driven Decision Making - Dinesh Kumar (textbook for reference)
2. Machine Learning Using Python - Manaranjan Pradhan and U Dinesh Kumar (textbook for reference)
3. Exploratory Data Analysis with R - Roger D. Peng Online
4. UC Business analytics R guide - University of Cincinnati - Online