2019: The high and low temperatures

January 2020

Last winter I hooked up a DS18B20 thermometer to a Raspberry Pi. I set up a python library to read the temperature, and a cron job to save the temperature in my postgres database every minute. Then I built a webapp to display the last 24 hours of data. The first stable readings were stored on Jan 5 2019, so we have arrived at one full year of readings from my thermometer project 🎉!

The data are probably uninteresting to anyone who does not live in my apartment. Actually, they're not even interesting to all the people who do live here 🤔.

But I think the data can tell a story. At least some of it. This post tells the stories of unusual days in my apartment, from the perspective of temperature measurements.

Describing a day's temperatures

Early on, I wrote out a SQL view that computes summary statistics on each day's temperature readings. It shows me descriptives like:

dt readings min max range avg stddev avg_minute_delta day_delta
2019-01-05 1440 67.9 74.9 7 71.5 1.7 0.002 -3.4
2019-01-06 1440 71.2 74.8 3.6 73 0.7 -0.001 1.2
2019-01-07 1440 68.9 73.5 4.6 70.7 1 -0.003 3.7
2019-01-08 1440 67.8 72.5 4.7 70.3 1.1 0.002 -2.4
2019-01-09 1440 69.5 73 3.5 71.6 0.8 0 0.3

Most of those columns have sensible names. avg_minute_delta describes the average difference in temperature per minute across adjacent readings. day_delta is the difference between the first and last reading in the day.

I looked into these summary stats for days which stand out, days like:

🔥 the warmest day

The day with the highest average temperature was July 5. We were in Baltimore to celebrate July 4 with my family, so we closed up the windows and shut off the AC and the apartment absolutely roasted.

❄ the coldest day

Per NYC regulation, landlords are required to maintain a temperature of 68℉ during the day in the winter, but on Feb 1 the temperature hung below 66℉ all day long.

The most variable day

On December 16 we were burgled (everyone was ok!). The burglar entered through the fire escape window, which is near where the Raspberry Pi thermometer is tucked away. That huge dip in the temperature is exactly the time the burglar entered the apartment. The detective that caught our case thought this was awesome!

Anomaly Detection

We can also use statistics to identify days with unusual properties. I queried my database with Python like:

import os
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine(os.environ['SQLALCHEMY_DATABASE_URI'])
daily_stats = (
  pd.read_sql_table("daily_stats", engine)
  .assign(dt=lambda x: pd.to_datetime(x.dt))
  .set_index('dt')
)

print(daily_stats.round(2).head())
#             readings    min    max  range    avg  stddev  avg_minute_delta  day_delta
# dt                                                                                   
# 2019-01-05      1440  67.89  74.86   6.98  71.54    1.65               0.0      -3.38
# 2019-01-06      1440  71.15  74.75   3.60  72.96    0.75              -0.0       1.24
# 2019-01-07      1440  68.90  73.51   4.61  70.65    0.95              -0.0       3.71
# 2019-01-08      1440  67.78  72.50   4.72  70.28    1.12               0.0      -2.36
# 2019-01-09      1440  69.46  72.95   3.49  71.62    0.79              -0.0       0.34

If we assume all days are drawn from a multivariate normal distribution across the summary stat metrics, \(x \sim \mathcal{N}\left(\mu, \Sigma\right) \), outliers would be considered dates with very low \( p\left(x \vert \mathcal{N}\left(\mu, \Sigma\right)\right) \).

I normalized the stats and calculated the empirical parameters for the multivariate Normal distribution. Then I used scipy.stats to do the heavy lifting.

import numpy as np
import scipy.stats

zscores = (
    daily_stats
    .drop(columns=['readings'])  # I don't care about reading count
    .apply(lambda col: (col - col.mean()) / col.std())
)

mu = zscores.values.mean(axis=0)
sigma = np.cov(zscores.values.transpose())
density = scipy.stats.multivariate_normal.pdf(zscores.values, mu, sigma)

Then I picked out the days with the lowest densities! Here are those dates:

dt readings min max range avg stddev avg_minute_delta day_delta
2019-12-16 1440 50.787 71.15 20.363 65.003 6.334 -0.003 3.6
2019-09-12 1440 74.862 86.9 12.038 83.073 3.715 -0.008 11.925
2019-09-19 1440 65.187 77.9 12.713 73.059 4.709 0.005 -7.987
2019-09-11 1440 77.9 87.125 9.225 82.538 3.736 0.006 -8.1
2019-10-01 1440 73.287 82.737 9.45 78.11 3.643 0.005 -7.313

The first anomaly is an obvious one: the most variable day of the year which I wrote about above. See below for plots of the other four:

And for good measure, here is the least anomalous date according to my very naive model.