homeassistant-powercalc icon indicating copy to clipboard operation
homeassistant-powercalc copied to clipboard

Add an approximation method for non-scanned configurations

Open dsolva opened this issue 1 year ago • 7 comments

I just added a few new lights and now everything works wonderfully. However, I noticed that many light configurations in between scans (for the bri and mired steps) fail by up to 5% or more. Currently I'm using Wiz automatic circadian cycle and looking to use adaptive lighting, so my lights are all the time changing temp.

In an attempt to fill the gaps between bri and mired steps I tested some Machine Learning algorithms and unsurprisingly the best one was K-nearest neighbors with 4 neighbors. Without getting too complicated on the final model, this simple algorithm just summarizes in a very simple approximation to the untested configuration by using the 4 closest configurations that are indeed measured. I even generated the full 77010 possible configurations using that criterion and its always performing better.

The idea is not to fill the gaps in the .gz that exist, just add this logic when making the lookup. I am guessing it can be done without too much computational effort and it’s not even an interpolation or anything complex.

I am not sure if this has been proposed/attempted before, but if you see value on doing it i can do my best on helping to implement it. Either provide with a detailed explanation on the theoretical part (no big deal) or dig into the code and do it myself.

In the image you can see (plug cyan, proposal blue, .gz purple) they start perfectly with a known configuration on the .gz, then I move to three configurations in between .gz measurements and the proposal is almost always on the spot (actually the "error" is just the plugs rounding to one decimal, otherwise they would be the same value). On this example the purples error is around 5%.

image

dsolva avatar Jul 23 '22 01:07 dsolva

As measuring is... very slow, it'd be interesting to be able to take a very loose sampling and use a technique like this to provide approximated calculations.

CloCkWeRX avatar Jul 23 '22 03:07 CloCkWeRX

@dsolva I read you have a lot of experience with Machine learning algorithms, data science and math in general. I don't have so I'm probably not up to the task ;-). Sounds very promising. I did try my best to do some simple interpolation, but I only did that across one dimension (brightness). You can give it a shot and try to implement if you want. I can give you some pointers to the relevant code:

Initial lookup dictionary is build here. Only once per startup of powercalc. https://github.com/bramstroker/homeassistant-powercalc/blob/master/custom_components/powercalc/strategy/lut.py#L47

Interpolation between lower and higher brightness is done here. It's done each time the calculation is called (on state changes of the bind light). https://github.com/bramstroker/homeassistant-powercalc/blob/master/custom_components/powercalc/strategy/lut.py#L194

Would this K-means algorithm only apply to color temp mode of als for HS mode? Because when you want to pre generate all possible configurations it would get massive.

Also it's important to do some quick checks about the computational overhead when generating the lookup tables. To make sure it doesn't delay the startup that much. This problem will get bigger probably on low end devices (like the raspberry pi's)

@CloCkWeRX Even for approximation algorithms it needs enough data points to do any realistic proximations. Previously the default measure tool settings took a lot more data points. HS mode took more then a day to complete. We have lowered the requirements so each measure session will take max ~2 hours. Which seem fine by me as it is a one time job only per light. Maybe @dsolva can advice if the amount of data points can be lowered still? But personally I'd like to keep up the quality of the LUT files by having a high number of real measured data points.

bramstroker avatar Jul 23 '22 05:07 bramstroker

@bramstroker I only tested the model for color temp but certainly will try with hs and let you know. Given that the best performing algorith was KNN and its greatest weakness is carrying over its entire training set (which is already done by having de Lut file) I'm expecting no huge increment. I pregenerated the lookup table as an alternative to implementing this properly, once programed no massive tables will have to be generated beforehand. In terms of computational effort, it would be the same as looking for the 4 closest values in the original LUT file and thats it, the rest is simple math. Maybe as I understand the code I will se how the model can be adapted. Still I totally agree on testing it before to see the impact. I know my NUC might be doing well right now, but if we see a slightly questionable spike on loading time then it could be an option to the user to use the "approximation".

About @CloCkWeRX comments on minimum datapoints, what we can do is that I finish the champion model and I will throw at it all the existing LUT files and start dropping datapoints and test. It would be a sort of compression with loss. With that we can see the overall metrics of highest error and root mean squared error and see how much we are willing to give up. The model would be challenged for its capacity to recover the lost granularity. We will see how low we can go and still be able to reconstruct. For what I saw with my own measurement I wouldn't go below that for now, it feels like a sweet spot of granularity. Also I need to see how manufacturers put the bri, mired,etc values into the real world. For my Wiz lightbulbs some colors are very similar in consumption but other manufacturers might distribute their spectrum quite different, 10 increment in one value could mean something very different for each model and even change exponentially in the same bulb! Too many things, in the end, leveraging the entire model base we have will give us an answer.

dsolva avatar Jul 23 '22 06:07 dsolva

Sounds promising. I also have a NUC btw. We can of course make it an option to activate the approximation. But when it does not have any significant overhead I would opt to make it the default.

bramstroker avatar Jul 23 '22 06:07 bramstroker

@dsolva Do you have any update? Still have plans to work on this in the near future?

bramstroker avatar Aug 28 '22 10:08 bramstroker

@dsolva Do you have any update? Still have plans to work on this in the near future?

I finished some first tests on the data to understand the accuracy of the approach and the topic on granularity reduction. Sorry for taking so long but I moved recently, and I've been setting up a ton of things in my house and work. I hope to resume working on this next week. I want to have a thorough analysis and the performance of the method so it’s safe to use and people can judge whether to enable it or not.

dsolva avatar Sep 08 '22 01:09 dsolva

@dsolva No worries, thanks for the update.

bramstroker avatar Sep 18 '22 20:09 bramstroker

@dsolva I assume you are not working on this anymore? Do you still plan to continue working on this. Cleaning up the issue list, so when this is stale I will close the issue.

bramstroker avatar Mar 11 '23 09:03 bramstroker

Closed due to inactivity. We can reopen when you want to work on this again.

bramstroker avatar Mar 17 '23 17:03 bramstroker