Creating a quick Gantt chart for drilling schedules
A while ago I was writing reports on drill programs, and came across the need to represent the executed drill schedule in a way that was slightly better than a table of dates. I personally find staring at a table of values to be useless, and tend to at the very least conditionally colour cells, if not design a (sometimes elaborate) data visualisation. I have a soft-spot for Gantt charts, and figured the date-organised cascade of actions to be a great candidate for presenting drill scheduling data.
At the time, I was getting started with matplotlib
so the code below makes use of that, and of course pandas
- though if I had to do this again, I may choose to make use of a slightly higher-level library such as plotly
Import libraries
Starting with importing all the libraries we’re gonna need:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch, Rectangle
import matplotlib.dates as mdates
Define set up variables
Next, we’ll set the input data source, and the columns we’re interested in. I used a standard .csv collar output from our drillhole database (acQuire), so the column names used here are fairly typical in my experience. Of course, if you’re following along, these may vary.
# columns of interest
importcolumns = ['HOLEID', 'DEPTH', 'Planned_HoleID', 'STARTDATE', 'ENDDATE', 'Col_DrillType_D']
# path to the .csv file
file = 'path/to/collars.csv'
# conversion froom inches to cm
cm = 1/2.54
Now, initially I had set this up to grab all .csv files in a given directory and loop through them all, producing a Gantt for each, but for simplicity I’ll be using just one .csv file here.
I’ve also included a quick cm to inches conversion, as I live in a metric country, and matplotlib insists on inches…
Define colour dictionary
Next, I’m gonna define a dictionary to map the drill type (RC or DD in this case) to a colour, which will be useful later
cdict = {
'RC': 'cornflowerblue',
'DD': 'darkorange',
}
Read data and convert dates
And now for the main event, actually making a Gantt chart. Firstly we’re gonna need to read in the data and do some date conversions to set it all up.
We’ll read in the data using our previously defined columns.
df = pd.read_csv(file, usecols=importcolumns)
Next we’ll need to convert the dates to python datetime
objects so we can actually work with them.
I’ve made a small function that does this, which make it look a little neater.
# convert columns to datetime
def convertdate(df):
dates = ['STARTDATE', 'ENDDATE']
for date in dates:
df[date] = pd.to_datetime(df[date])
return df
Next, we’ll convert the dates and set up the Gantt variables.
# convert dates
df = convertdate(df)
# set gantt variables
project_start = df.STARTDATE.min() # project start date
df['start_num'] = (df.STARTDATE - project_start).dt.days # number of days from project start to task start
df['end_num'] = (df.ENDDATE - project_start).dt.days # number of days from project start to task end
df['duration'] = df['end_num'] - df['start_num'] + 1 # duration of task in days
Gantt variable additional details
Variable | Description |
---|---|
project_start | The earliest start date of the drill program, taken as the minimum date in the STARTDATE column |
start_num | The number of days from the program start to the task (drillhole) start |
end_num | The number of days from the program start to the task end |
duration | The duration of the task in days. +1 day is added as the date range given is inclusive. Eg. 01-Feb to 05-Feb is 5 days of drilling, but this will otherwise calculate the task length as 4 days. |
Change labels
The collar table I’m working with has a Col_DrillType_D
column, which has some lengthy default values. To make it slighlty easier to read, I’m going to change these to just ‘RC’ and ‘DD’.
# convert 'REVERSE CIRCULATION' to 'RC' and 'Diamond' to 'DD' in df['Col_DrillType_D']
df['Col_DrillType_D'] = df['Col_DrillType_D'].str.replace('REVERSE CIRCULATION', 'RC')
df['Col_DrillType_D'] = df['Col_DrillType_D'].str.replace('DIAMOND', 'DD')
Sort dataframe values
Now we’ll sort the values by STARTDATE
if they’re not already, so that everything shows up in order from top to bottom when we plot the chart.
# sort values
df.sort_values(by = ['STARTDATE'], ascending = False, inplace = True)
Define colour column
Next, we’ll define a new column in the dataframe that will be used to map the drill type to a colour.
First I’ll define a function return the colour corresponding to each drilltype in the dictionary cdict
we defined earlier.
uing df.apply
we can then run this function on each row of the dataframe, and create a new column color
with the result.
# function to set color for each row
def colorrow(row):
return cdict[row['Col_DrillType_D']]
# create colors column
df['color'] = df.apply(colorrow, axis = 1)
Plot the Gantt chart
Finally, we can plot the Gantt chart. Here I’ve set a specific figure size in cm so as to be used in a report, but you can adjust this to your needs.
The chart used here is a standard matplotlib
horizontal bar chart (hbar
), with the left
parameter set to the start_num
column, and the width
parameter set to the duration
column. The color
parameter is set to the color
column we just created.
# plot horizontal bar chart
fig, ax = plt.subplots(1, figsize = (16*cm, 8*cm))
gantt = ax.barh(
df['HOLEID'],
df['duration'],
left = df['start_num'],
color = df['color']
)
Then right at the end, we just need to show or save our Gantt chart, and we’re most of the way there!
# show
plt.show()
# or save
fig.savefig('drillgantt')
Further formatting
That gets you most of the way, but there’s tonnes more you can do with formatting the axes, adding labels, adding break periods, and so on. Those with keen eyes will notice I’ve imported a few as yet used libraries and classes, so I’ll have to leave those for another episode to finish off this thing.
Until next time 👋