Machine learning & Kafka KSQL stream processing — bug me when I’ve left the heater on

Continuous household power monitoring with a trained model to alert on unexpected power usage for time of day. Let my Google Home and Phone notify me when I’ve used more electricity than expected.

What on earth is this?

Ready for the buzz words? Data capture via a Raspberry Pi streamed via Apache Kafka. Real-time processing using KSQL streaming engine enriched with a trained model. Unexpected time of day power usage events get sent via mobile notifications to my phone along with a spoken message via our home speaker.

Inspiration for this project comes from Kai Waehner and his project Deep Learning UDF for KSQL. The notification system was inspired by Robin Moffatt and his blog on Event-Driven Alerting with Slack.

The entire project is easy to run and simply requires docker to demonstrate locally.

I’m bored — can I get a video?

Real-time power usage monitoring

Training and Data Capture

Power consumption is measured in “W/hr — Watt hours”. The power consumption range for a 24 hour period in August is shown below. The area shaded in green shows the range from the minimum value for that hour to the highest consumption. In short, values in the green area are normal, and the red area is unexpected.

Expected range (green) of power consumption by time of day for the month of August

Using H2O.ai (the open source machine learning package) and a very helpful getting started guide I was able to produce a function that returns an anomaly score based on day, hour and power usage. That is, passed a day of month, hour-of-day and power-usage reading the function returns an anomaly score

AnomalyScore = AnomalyFunction (day, hour, power-usage)

Consuming 1500 W/hr at 9 am is pretty normal, however 1500 W/hr power draw at 4 am is a cause for alarm as that’s 500% more than expected. In short, an anomaly score above 1.0 signifies power usage beyond what has previously been historically measured at that time and day.

Anomaly score based on hour and power usage. Values above 1.0 are unusual

User Defined Function — and KSQL

TL;DR summary — compile some Java and place in the right directory, start ksql server and verify the function is there …

ksql> list functions; Function Name           | Type
-------------------------------------
. . .
ANOMOLY_POWER | SCALAR <--- I need this one
ANOMOLY_WATER | SCALAR

Firstly a stream (“raw_power_stream”) is created to expose the real-time power-consumption from the kafka topic with real-time power consumption.

The scripts below show the steps to create the final “anomaly_power” kafka topic which will be a stream of events where the anomaly function (“anomaly_power”) has found a significantly unusual value. That is, the “anomaly_power” topic should be silent unless an unusual event has occurred

create stream raw_power_stream with (kafka_topic='raw_power', value_format='avro');

create stream power_stream_rekeyed as \
select rowtime, hour, kwh, anomoly_power(hour, kwh) as fn \
from raw_power_stream partition by rowtime;

create stream anomoly_power with (value_format='JSON') as \
select rowtime as event_ts, hour, kwh, fn \
from power_stream_rekeyed where fn>1.0;

Ringing the Alarm

Mobile Device notification via Pushbullet

An iOS push-notification via push-bullet

A bit of python code consumes the ANOMOLY_POWER topic and calls pushbullet. A consumer is established, and an event handler calls the notification service on receipt of a new Kafka events. Each message generates a new push notification.

Google Home Text-to-Speech (TTS) via Home Assistant

What did I learn?

Ready To Try

Credits & References

Day job: data steaming & system architecture. Night gig: IoT and random project hacking