bermuda Using a Random Forest classifier for presence detection

First of all, thanks for the great HA integration. I had some trouble with getting it to accurately determine which room my device is in in my multistory home, with the device often ending up being detected on the wrong floor.

I decided to prototype a Python project using NodeRed and a RandomForestClassifier from scikit-learn, using the distances from each bluetooth sensor inputs and manually labeling each room. Using a dataset of about 600 measurements, I'm able to reliably determine what room my phone is in. Better yet, this can also be used to create additional zones as you're no longer simply identifying the shortest distance to a receiver, but rather the collection of different distances between the device and different receivers.

I'd like to package this up into a change and contribute it to this project, but I want to make sure that you're open to the idea before I expend the effort. From what I can see, roughly the following would need to be added

A BermudaSelector which would display all the available areas in a home + a disabled state which specifically used for labeling. During training, you walk around your house selecting the room you're in while it gathers either RSSI/proximities between all BLE sensors and your device.
A BermudaSensor for each device to show the ML predicted room that the device is currently in.
A dependency on scikit-learn for RandomForestClassifier and a sqlite database to keep a table of rssi/distances to labeled areas
Modify coordinator.py#_refresh_area_by_min_distance to pass the full list of distances to RandomForestClassifier
A service to empty the sqlite database.

The ML prototype code is

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

    def replaceUnknown(value):
        if value == 'unknown':
            return 9999
        else:
            return value

    def predictDistances(self, data): # data contains a dictionary of bluetooth -> proximity
        df = pd.read_sql_query("SELECT * FROM area_distances", self.db.getConnection()) # database stores previously labeled data.
        le = LabelEncoder().fit(["unknown", "Living Room", "Bedroom", "Attic", "Basement", "Kitchen", "Basement Bathroom", "Main Bathroom", "Laundry"]) # ML algorithms dont do well with strings, LabelEncoder simply changes a string to an indexed integer.
        df['detected'] = le.transform(df['detected'])
        X = df.drop(['actual'], axis=1)
        y = le.transform(df['actual'])
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

        rf_model = RandomForestClassifier(bootstrap=True, max_depth=30, min_samples_leaf=1, min_samples_split=2, n_estimators=100)
        rf_model.fit(X_train, y_train)
        X_new = [
            [le.transform([data['detected_area'].strip()])[0],
             replaceUnknown(data['attic_hvac']),
             replaceUnknown(data['basement_hvac']),
             replaceUnknown(data['cd_office_hvac']),
             replaceUnknown(data['front_room_hvac']),
             replaceUnknown(data['guest_room_hvac']),
             replaceUnknown(data['paul_office_hvac'])
            ]
        ]
        y_pred = rf_model.predict(X_new)
        predicted_value = le.inverse_transform(y_pred)[0]
        return predicted_value

Let me know if you'd like to chat further and perhaps work on this together. I have plenty of Python experience, but I haven't spent any time developing Home Assistant components and already horrified at the idea of having to fully restart Home Assistant before testing out changes.

Apr 16 '25 02:04 donutsoft

It seems that you can't rely on scikit-learn directly within a home assistant component, so this would require us to go down the route of building an Addon to handle the machine learning side of things

Apr 16 '25 04:04 donutsoft

I'm really looking forward to it

May 14 '25 09:05 oulianxian

I've been working on it for the last few weeks, but I've decided to stick with using MQTT to keep this flexible for other sensor types. I'd appreciate if you could try it out and give me any feedback. https://github.com/donutsoft/ml2mqtt

I want to advertise it on Reddit, but I want to ensure that this is in decent enough shape so people don't lose interest before it has the opportunity to take off.

May 15 '25 06:05 donutsoft

@donutsoft not sure if you've seen my FR in your repo - would it be possible to enable mqtts (mqtt-with-tls) for your addon?

Jul 23 '25 06:07 TheMagnificent7