What I Learned Yesterday #19 (pandas Time Series)

Recently I learned that the best way to digest information, assimilate it is a two-step algorithm:

  • Share new information within 12-24 hours with 2 different persons.
  • Apply new knowledge – practice it.

And that’s it. Nothing complicated. So I decided to use my blog for the first part.

I constantly learn new data science techniques, so here I want to share what was the most recent. 

(technical text begins here) In the last lesson I learned more about pandas time series and how to work with indexes that contain this type of data.

First and very awesome characteristic of time series index is partial datetime string selection:

# Select sales data for the 5th of February, 2015
sales.loc['2015-02-05'] sales.loc['February 5, 2015'] sales.loc['2015-Feb-5'] # Whole month sales.loc['2015-2'] # whole year sales.loc['2015']

Be careful with data types. If your index consists of strings the tricks above won’t work. To convert a string to a datetime we can use pandas to_datetime() method. Specifying format parameter helps with the formatting. Default format is ISO 8601 (‘yyyy-mm-dd hh:mm:ss’)

pd.to_datetime(['2015-2-16', '2015-2-20'], format='%Y-%m-%d %H%M%S)

Other cool feature of time series index is resampling. There are two types of it: downsampling and upsampling. Former is when we have 9 rows of data for 9 hours each row representing each hour. We can downsample it and get a summary for 3 hour groups. Example:

>>> index = pd.date_range('1/1/2000', periods=9, freq='H')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 01:00:00    1
2000-01-01 02:00:00    2
2000-01-01 03:00:00    3
2000-01-01 04:00:00    4
2000-01-01 05:00:00    5
2000-01-01 06:00:00    6
2000-01-01 07:00:00    7
2000-01-01 08:00:00    8
Freq: T, dtype: int64

>>> series.resample('3H').sum() 2000-01-01 00:00:00 3 2000-01-01 03:00:00 12 2000-01-01 06:00:00 21 Freq: 3T, dtype: int64

Upsampling is an operation in opposite direction. Example,  upsample the series into 30 second bins.

>>> series.resample('30S').asfreq()[0:5] #select first 5 rows
2000-01-01 00:00:00   0.0
2000-01-01 00:00:30   NaN
2000-01-01 00:01:00   1.0
2000-01-01 00:01:30   NaN
2000-01-01 00:02:00   2.0
Freq: 30S, dtype: float64

Upsample the series into 30 minute bins and fill the NaN values using the pad method.  

>>> series.resample('30T').pad()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:30:00    0
2000-01-01 01:00:00    1
2000-01-01 01:30:00    1
2000-01-01 02:00:00    2
Freq: 30S, dtype: int64

And a chetlist for frequencies

Data Visualization

We have a variety of option to customize our plots using python. We can change color, marker and line type. Below is a little summary of available options

That’s all for today. I will learn something new and share it here soon. Have an awesome day!

Leave a Reply

Your email address will not be published. Required fields are marked *