Stacked bar chart in python
Bar chart is the most frequently used chart because of their easy design and simple to understand. Visualizing data distribution in groups gives a clear understanding to proceed with any further task. A stacked bar chart is one of the variations in bar chart category. A stacked bar chart is important when there are multiple categories and each category have similar subcategories. All bars in percent stacked bar chart have equal lengths which is discussed later at the end of this tutorial.
Simple bar chart
Let’s take the data from a random survey of 200 people and their favourite food. We will write our data in a python dictionary and plot a bar chart.
food = {“pizza”: 50, “burger”: 70, “pasta”:30, “noodles”:20, “curry”:30}
Next, import necessary libraries in python. Also, read values and labels from the dictionary that we have defined above. For example, “pizza”, “burger”, “noodles” are the keys and 50, 70, 20 are the values. An index is used on the x-axis to define the position of bars. Finally, plot that bar chart using plt.bar function from matplotlib.pyplot. Height of each bar is decided by numbers. plt.xticks() write our string labels for each bar.
import numpy as np
import matplotlib.pyplot as pltlabels = []
numbers = []
total = 0
for key, value in food.items():
labels.append(key)
numbers.append(value)
total += value
index = np.arange(len(labels))plt.figure(figsize=(7,4))
plt.bar(index,numbers)
plt.xticks(index, labels, fontsize=10, rotation=0)
plt.ylabel('count', fontsize=10)
plt.title("total data: {}".format(total))
plt.show()
Horizontal bar chart with label
Now imagine your label names for each bar is very long and is quite difficult to write on the x-axis. In this case, the horizontal bar chart is very useful. In addition, it would be helpful to see the category count in bar chart right? Following code create a horizontal bar chart with big labels and category count on the graph.
import numpy as np
import matplotlib.pyplot as pltfood = {"Napoletana pizza": 50, "Chargrilled burger": 70, "Macaroni pasta":30, "Capellini noodles":20, "Macanese curry":30}labels = []
numbers = []
total = 0
for key, value in food.items():
labels.append(key)
numbers.append(value)
total += value
index = np.arange(len(labels))plt.figure(figsize=(10,5))
plt.barh(index,numbers)
for i, v in enumerate(numbers):
plt.text(v + 0.2, i , str(v), color='black', fontweight='bold')
plt.yticks(index, labels, fontsize=10, rotation=0)
plt.xlabel('No of examples', fontsize=10)
plt.title("total data: {}".format(total))
plt.show()
stacked bar chart
As the name suggests, the stacked bar chart is plotted by stacking each group on the one another. This type of graph is useful when we have multiple values for each category.
Let’s consider an example where four quarterly sales of their three product is given. Three products are jeans, t-shirt and trousers. Here categories are Q1, Q2, Q3 and Q4. Whereas, groups are jeans, t-shirt and trousers.
Advantage: easy to get the total count for each category.
Disadvantage: difficult to compare with the group of other categories.
# stacked bar plotimport numpy as np
import matplotlib.pyplot as plt#Get values from the group and categories
quarter = ["Q1", "Q2", "Q3", "Q4"]
jeans = [100, 75, 50, 133]
tshirt = [44, 120, 150, 33]
formal_shirt = [70, 90, 111, 80]
#add colors
colors = ['#FF9999', '#00BFFF','#C1FFC1','#CAE1FF','#FFDEAD']# The position of the bars on the x-axis
r = range(len(quarter))barWidth = 1
#plot bars
plt.figure(figsize=(10,7))
plt.bar(r, jeans, color=colors[0], edgecolor='white', width=barWidth, label="jeans")
plt.bar(r, tshirt, bottom=np.array(jeans), color=colors[1], edgecolor='white', width=barWidth, label='tshirt')
plt.bar(r, formal_shirt, bottom=np.array(jeans)+np.array(tshirt), color=colors[2], edgecolor='white', width=barWidth, label='formal shirt')
plt.legend()# Custom X axis
plt.xticks(r, quarter, fontweight='bold')
plt.ylabel("sales")
plt.savefig("stacked1.png")
plt.show()
Now add the count of each group in each category to make life easy.
# stacked bar plotimport numpy as np
import matplotlib.pyplot as plt#Get values from the group and categories
quarter = ["Q1", "Q2", "Q3", "Q4"]
jeans = [100, 75, 50, 133]
tshirt = [44, 120, 150, 33]
formal_shirt = [70, 90, 111, 80]
#add colors
colors = ['#FF9999', '#00BFFF','#C1FFC1','#CAE1FF','#FFDEAD']# The position of the bars on the x-axis
r = range(len(quarter))barWidth = 1
#plot bars
plt.figure(figsize=(10,7))
ax1 = plt.bar(r, jeans, color=colors[0], edgecolor='white', width=barWidth, label="jeans")
ax2 = plt.bar(r, tshirt, bottom=np.array(jeans), color=colors[1], edgecolor='white', width=barWidth, label='tshirt')
ax3 = plt.bar(r, formal_shirt, bottom=np.array(jeans)+np.array(tshirt), color=colors[2], edgecolor='white', width=barWidth, label='formal shirt')
plt.legend()# Custom X axis
plt.xticks(r, quarter, fontweight='bold')
plt.ylabel("sales")for r1, r2, r3 in zip(ax1, ax2, ax3):
h1 = r1.get_height()
h2 = r2.get_height()
h3 = r3.get_height()
plt.text(r1.get_x() + r1.get_width() / 2., h1 / 2., "%d" % h1, ha="center", va="center", color="white", fontsize=16, fontweight="bold")
plt.text(r2.get_x() + r2.get_width() / 2., h1 + h2 / 2., "%d" % h2, ha="center", va="center", color="white", fontsize=16, fontweight="bold")
plt.text(r3.get_x() + r3.get_width() / 2., h1 + h2 + h3 / 2., "%d" % h3, ha="center", va="center", color="white", fontsize=16, fontweight="bold")plt.savefig("stacked2.png")
plt.show()
# You can replace "%d" % h1 with "{}".format(h1)
Percent stacked bar chart
It is used to see the relative difference of quantities between categories. In general, it is good to visualize with 2 or 3 groups in the categories.
Advantage: shows the relative difference
Disadvantage: more groups in each category produces visual noise and is not a good idea to use it.
# stacked bar plotimport numpy as np
import matplotlib.pyplot as plt#Get values from the group and categories
quarter = np.array(["Q1", "Q2", "Q3", "Q4"])
jeans = np.array([100, 75, 50, 133])
tshirt = np.array([44, 120, 150, 33])
formal_shirt = np.array([70, 90, 111, 80])total = jeans+ tshirt+ formal_shirt
proportion_jeans = np.true_divide(jeans, total) * 100
proportion_tshirts = np.true_divide(tshirt, total) * 100
proportion_formal = np.true_divide(formal_shirt, total) * 100
#add colors
colors = ['#FF9999', '#00BFFF','#C1FFC1','#CAE1FF','#FFDEAD']# The position of the bars on the x-axis
r = range(len(quarter))barWidth = 1
#plot bars
plt.figure(figsize=(10,7))
ax1 = plt.bar(r, proportion_jeans, bottom=proportion_tshirts+proportion_formal, color=colors[0], edgecolor='white', width=barWidth, label="jeans")
ax2 = plt.bar(r, proportion_tshirts, bottom=proportion_formal, color=colors[1], edgecolor='white', width=barWidth, label='tshirt')
ax3 = plt.bar(r, proportion_formal, color=colors[2], edgecolor='white', width=barWidth, label='formal shirt')
plt.legend()
plt.xticks(r, quarter, fontweight='bold')
plt.ylabel("sales")plt.savefig("percentileStacked.png")
plt.show()
Take home message
So we have seen the different visualization of data using bar chart.
- Use a simple bar chart if labels are not too big
- Use horizontal bar chart when labels are too big
- Use a stacked bar chart to see the total of each group per category
- Use percentile stacked bar chart to relatively compare different groups between categories.