01 Apr 2019 in python matplotlib tutorial

Adding Annotations to Bar and ScatterPlots + basic styling with matplotlib

Today we will go through the process of creating a scatter and bar plot with matplotlib and Pandas and see what we can do along the way to increase readability of our plots.

# import some usefull python Data Science Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# declare this to be a matplotlib notebook for better rendering
%matplotlib notebook
# Create a dataframe for us to play around with
df = pd.DataFrame(data={"Language": ["Python", "Java", "C++", "Haskell"], "Readability Score":[95, 85, 75, 45]})
df.index.name = "Hardness"
df
Language Readability Score
Hardness
0 Python 95
1 Java 85
2 C++ 75
3 Haskell 45
# Let's create a scatter plot of the Readability Score and Hardness index.. 
# Because pandas scatter plot does not have access to our index we must reset the index
# To place our Hardness index into a column for use
df.reset_index().plot.scatter(x="Hardness", y="Readability Score")
# get our axis with plt.gca() then we can create our own labels and other functions
ax = plt.gca()
ax.set_xlabel("Learning Curve")
plt.show()

# We can make this plot better by adding sizes to the circles and then adding anotations of the language at their positions
# we can first create dynamic colors by sampling with the nump package the c argument we provide colors for each element
# with values for (R,G,B) that is why we use np.random.sample(len(df), 3) to get colors matching the length of our df
colors = np.random.sample([len(df), 3])
df.reset_index().plot.scatter(x="Hardness", y="Readability Score", s=df["Readability Score"].values * 5, c=colors)
ax = plt.gca()
ax.set_xlabel("Hardness")
# We can enumerate through our index to produce annotations
# ax.annotate will take in our arguments for ax.annotate(text, (x_cord, y_cord)) Which are the main arguments necessary
# This works because our x position translates to index
for i in df.index:
    ax.annotate(df["Language"][i], (i, df["Readability Score"][i]), ha="center", fontsize=9)
plt.show()

# Now let us look at the bar graph for similar example
# if we do not provide a x value to df.plot.bar it will use the indexes by default 
# So we can set our index to Languae and have the plots come out great
df.set_index("Language").plot.bar(y="Readability Score")
ax = plt.gca()
# Add Horizontal rotation because vertical does not look as nice
plt.xticks(rotation="horizontal")
ax.tick_params()
plt.show()

# We can do the same as above to have our xtick labels be marked with our languages
df.set_index("Language").plot.bar(y="Readability Score", legend=False)
ax = plt.gca()
plt.xticks(rotation="horizontal")
plt.title("Readability Score of Programming Languages out of 100")
ax = plt.gca()
# This is where the magic happens
# we can get the patch positions of every element from our bar graph by iterating through our ax.patches elements
# We then use ax.annotate(txt, (x_cord, y_cord)) to provide text in the form of bar.get_height() -> height of our bar
# x_cord can be gotten by using bar.get_x() then alignment can be done by playing around with some math
for bar in ax.patches:
    ax.annotate(str(bar.get_height()), (bar.get_x() + bar.get_width()/4, bar.get_height() - 5),ha="center", color="w", fontsize=11)
plt.show()

# We can do the same as above to have our xtick labels be marked with our languages
colors = ["blue"] + (["grey"] *(len(df) - 1))
df.set_index("Language").plot.bar(y="Readability Score", legend=False, color=colors)
ax = plt.gca()
plt.xticks(rotation="horizontal")
plt.title("Readability Score of Programming Languages out of 100")
ax = plt.gca()
# This is where the magic happens
for p in ax.patches:
    ax.annotate(str(p.get_height()), (p.get_x() + p.get_width()/4, p.get_height() - 5),ha="center", color="w", fontsize=11)

# Remove the top, bottom, left, right and labelleft ticks to have a clear plot
plt.tick_params(top=False, bottom=False, left=False, right=False, labelleft=False, labelbottom=True)

# Set the visibility of the spines to off
for spine in ax.spines.values():
    spine.set_visible(False)


plt.show()

Thank You For Reading
Marcus Crowder

I have fun solving problems and breaking things

comments powered by Disqus