Creating Stacked Waterfall Chart in Python using Matplotlib

What is Waterfall Chart?

Wikipedia says “a waterfall chart is a form of data visualization that helps in understanding the cumulative effect of sequentially introduced positive or negative values”. It shows a running total as values are added or subtracted.

In this article, we will show how to create two types of waterfall chart - a simple one and a stacked one.

Creating Simple Waterfall Chart in Matplotlib

Consider the following table:

Particulars	Company A
Revenue	15203
Cost of Goods	-4220
Employee Expenses	-1002
Income from Interest	52
Loan Payment	-13
Net Income	10020

For simplicity, we only have 1 column in this table.

The waterfall chart for this table would be:

Code

First, let’s import the necessary dependencies:

import pandas as pd
import matplotlib.pyplot as plt

plt.style.use('ggplot')

Then, read the CSV file:

df = pd.read_csv('./Waterfall-Chart-Data-Single.csv')

The CSV file used in this code can be found here.

Now, in order to render the waterfall chart we will be using matplotlib’s stacked bar chart. We will have an invisible base bar. The function get_data will be used to calculate the values for the stacked bar chart.

The data (used for rendering the stacked bar chart) for the above table would be:

{'_Base': [0, 10983, 9981, 9981, 10020, 0],
 'Company A Up': [15203, 0, 0, 52, 0, 10020],
 'Company A Down': [0, -4220, -1002, 0, -13, 0]}

def get_data(df):
    row_count = len(df.index)
    # Underscore is pre-pended to Base to ensure that it doesn't show up in the
    # lengend box
    data = {
        '_Base': [] * row_count,
    }
    # Since we have only 1 column, we are extracting the 2nd element
    column = df.columns[1]
    for change in ['Up', 'Down']:
        data[column + ' ' + change] = [] * row_count

    for key in data:
        data[key] = [0] * row_count
    
    for idx, value in enumerate(df[column]):
        if value >= 0:
            data[column + ' Up'][idx] = df.loc[idx, column]
        else:
            data[column + ' Down'][idx] = df.loc[idx, column]

    # calculate base
    prev = df.loc[0, column]
    # Base for last will always be 0
    for i in range(row_count - 1):
        delta = 0
        cur = df.loc[i, column]
        if prev <= 0 and cur < 0:
            delta = cur
        elif prev < 0:
            delta = 0
        elif prev > 0 and cur < 0:
            delta = cur + prev
        else:
            delta = prev
        # ignore the calculation for first as it will always be 0
        data['_Base'][i] = 0 if i == 0 else data['_Base'][i - 1] + delta
        prev = cur
    return data

Finally, we will render the stacked bar chart:

def render_chart(data, df):
    # get the first column
    particular_names = df.iloc[:,0]
    X_AXIS = [particular for particular in particular_names]

    index = pd.Index(X_AXIS, name='Particulars')
    colors = ['#ffffff00', '#00ff00', '#ff0000']
    df_for_plotting = pd.DataFrame(data, index=index).abs()
    ax = df_for_plotting.plot(
        kind='bar',
        stacked=True,
        figsize=(15, 6),
        color=colors
    )
    ax.set_ylabel('Amount')

    row_count = len(X_AXIS)
    # ignore the base texts
    for p in ax.patches[row_count:]:
        width, height = p.get_width(), p.get_height()
        x, y = p.get_xy()
        if height:
            ax.text(
                x + width / 2, 
                y + height / 2, 
                int(height),
                horizontalalignment='center', 
                verticalalignment='center',
                color='black',
            )

    plt.xticks(rotation=30)
    plt.legend(title='Labels', bbox_to_anchor=(1.0, 1), loc='best')
    plt.tight_layout()
    plt.show()

Let’s call the functions to render the chart:

data = get_data(df)
render_chart(data, df)

If we observe the image, we can see that the labels are overlapping with the bars if the height of the bars are less. Let’s fix this by positioning the labels above the bars if their height is less than some threshold.

def render_chart(data, df):
    # get the first column
    particular_names = df.iloc[:,0]
    X_AXIS = [particular for particular in particular_names]

    index = pd.Index(X_AXIS, name='Particulars')
    colors = ['#ffffff00', '#00ff00', '#ff0000']
    df_for_plotting = pd.DataFrame(data, index=index).abs()
    ax = df_for_plotting.plot(
        kind='bar',
        stacked=True,
        figsize=(15, 6),
        color=colors
    )
    ax.set_ylabel('Amount')

    row_count = len(X_AXIS)
    y_min, y_max = ax.get_ylim()
    max_height = y_max - y_min
    height_limit = max_height * 0.1
    # ignore the base texts
    for p in ax.patches[row_count:]:
        width, height = p.get_width(), p.get_height()
        x, y = p.get_xy()
        if height:
            height_offset = 0
            if height < height_limit:
                height_offset = max(max_height / 30, height * 5 / 6)
            ax.text(
                x + width / 2, 
                y + height / 2 + height_offset, 
                int(height),
                horizontalalignment='center', 
                verticalalignment='center',
                color='black',
            )

    plt.xticks(rotation=30)
    plt.legend(title='Labels', bbox_to_anchor=(1.0, 1), loc='best')
    plt.tight_layout()
    plt.show()

Full code for this illustration can be found here.

Note: If you need simple waterfall chart, you don’t need to follow this method since there are libraries which already do that (Plotly, waterfall_ax). This technique will be useful for the next chart type - stacked waterfall chart.

Creating Stacked Waterfall Chart in Matplotlib

Consider the following data:

Particulars	Company A	Company B
Revenue	15203	12023
Cost of Goods	-4220	-2500
Employee Expenses	-1002	-560
Income from Interest	52	58
Loan Payment	-13	-11
Net Income	10020	9010

The stacked waterfall chart looks like:

Code

Let’s start by importing the required libraries:

import pandas as pd
import matplotlib.pyplot as plt

plt.style.use('ggplot')

Then, read the CSV file:

df = pd.read_csv('./Waterfall-Chart-Data.csv')

Data for this illustration can be found here.

Like the previous illustration, we will be using matplotlib’s stacked bar chart. For the above table data (used for rendering the stacked bar chart) would be:

{'_Base': [0, 20506, 18944, 18944, 19030, 0],
 'Company A Up': [15203, 0, 0, 52, 0, 10020],
 'Company B Up': [12023, 0, 0, 58, 0, 9010],
 'Company A Down': [0, -4220, -1002, 0, -13, 0],
 'Company B Down': [0, -2500, -560, 0, -11, 0]}

The function get_data returns this data dictionary.

def get_data(df):
    row_count = len(df.index)
    # Underscore is pre-pended to Base to ensure that it doesn't show up in the
    # lengend box
    data = {
        '_Base': [0] * row_count,
    }

    columns = df.columns[1: ]
    for change in ['Up', 'Down']:
        for column in columns:
            data[column + ' ' + change] = [0] * row_count
    
    for column in columns:
        for idx, value in enumerate(df[column]):
            if value >= 0:
                data[column + ' Up'][idx] = df.loc[idx, column]
            else:
                data[column + ' Down'][idx] = df.loc[idx, column]

    # calculate base
    prev = 0
    # Base for last will always be 0
    for i in range(row_count - 1):
        delta = 0
        cur = df.loc[i, columns].sum()
        if prev <= 0 and cur < 0:
            delta = cur
        elif prev < 0:
            delta = 0
        elif prev > 0 and cur < 0:
            delta = cur + prev
        else:
            delta = prev
        # ignore the calculation for first as it will always be 0
        data['_Base'][i] = 0 if i == 0 else data['_Base'][i - 1] + delta
        prev = cur
    return data

Finally, we will use the render_chart function to render the stacked bar chart.

def render_chart(data, df):
    # get the first column
    particular_names = df.iloc[:,0]
    X_AXIS = [particular for particular in particular_names]

    index = pd.Index(X_AXIS, name='Particulars')
    colors = ['#ffffff00', '#00ff00', '#93f693', '#ff0000', '#f79696']
    df_for_plotting = pd.DataFrame(data, index=index).abs()
    ax = df_for_plotting.plot(
        kind='bar',
        stacked=True,
        figsize=(15, 6),
        color=colors
    )
    ax.set_ylabel('Amount')

    row_count = len(X_AXIS)
    # ignore the base texts
    for p in ax.patches[row_count:]:
        width, height = p.get_width(), p.get_height()
        x, y = p.get_xy()
        if height:
            ax.text(
                x + width / 2, 
                y + height / 2, 
                int(height),
                horizontalalignment='center', 
                verticalalignment='center',
                color='black',
            )

    plt.xticks(rotation=30)
    plt.legend(title='Labels', bbox_to_anchor=(1.0, 1), loc='best')
    plt.tight_layout()
    plt.show()

The rendered chart will look like this:

As we can observe, for smaller values the bar chart and the labels are overlapping (see Income from Interest and Loan Payment). Let’s fix this by positioning the labels above and below the bars for smaller values.

def render_chart(data, df):
    # get the first column
    particular_names = df.iloc[:,0]
    X_AXIS = [particular for particular in particular_names]

    index = pd.Index(X_AXIS, name='Particulars')
    colors = ['#ffffff00', '#00ff00', '#ff0000']
    df_for_plotting = pd.DataFrame(data, index=index).abs()
    ax = df_for_plotting.plot(
        kind='bar',
        stacked=True,
        figsize=(15, 6),
        color=colors
    )
    ax.set_ylabel('Amount')

    row_count = len(X_AXIS)
    y_min, y_max = ax.get_ylim()
    max_height = y_max - y_min
    height_limit = max_height * 0.1
    # ignore the base texts
    for p in ax.patches[row_count:]:
        width, height = p.get_width(), p.get_height()
        x, y = p.get_xy()
        if height:
            height_offset = 0
            if height < height_limit:
                height_offset = max(max_height / 30, height * 5 / 6)
            ax.text(
                x + width / 2, 
                y + height / 2 + height_offset, 
                int(height),
                horizontalalignment='center', 
                verticalalignment='center',
                color='black',
            )

    plt.xticks(rotation=30)
    plt.legend(title='Labels', bbox_to_anchor=(1.0, 1), loc='best')
    plt.tight_layout()
    plt.show()

Let’s call the functions to render the chart:

data = get_data(df)
render_chart(data, df)

Full code can for this illustration can be found here.

Usage in Power BI

Power BI doesn’t offer a built-in chart for stacked waterfall chart. This method can be used in Power BI for rendering stacked waterfall chart using its ability to render Python visuals.

Published 18 Jul 2021

CS Posts on Twitter