fast-robust-stl icon indicating copy to clipboard operation
fast-robust-stl copied to clipboard

Running into errors when running algorithm

Open szsb26 opened this issue 3 years ago • 3 comments

Hi Aria,

First, I want to thank you immensely for implementing this algorithm, as I could not find an implementation anywhere to test the claims in the original fast robustSTL paper.

However, im running into some issues when trying to run the algorithm.

When running your ex. with N = 1231 (size of data vector y), the algorithm runs fine, as shown below

Screen Shot 2021-06-04 at 2 00 30 PM

Screen Shot 2021-06-04 at 2 06 20 PM

However, when replacing "y" with a same size vector called "data" which looks like the following:

Screen Shot 2021-06-04 at 2 09 26 PM

I get the following error when running robustSTL

Screen Shot 2021-06-04 at 2 10 29 PM

where the error is:

Screen Shot 2021-06-04 at 2 11 18 PM

Since "y" and "data" are exactly the same size except with different values, and all parameters are fixed, im a bit confused as to what is going on and would really appreciate it if you can provide some help/insight.

If it helps, i've also attached the data used in "data" vector. It is in the attached dataframe under column "values"

internet_traffic_stl_results_complete.csv

Thanks! Sichen

szsb26 avatar Jun 04 '21 21:06 szsb26

@szsb26 Hello! I met the same problem as you when running it (replacing it with my own data), did you solve it please? Also when running the original example from the thesis, I found that the decomposition results were particularly poor and completely different from the results in the thesis, did you notice this?

xhd1203 avatar Jun 16 '21 13:06 xhd1203

@xhd1203 I solved the issue by scaling the data with the mean. (AKA data/data.mean). After running the algorithm, I then rescale the trend, and seasonal components back by multiplying data.mean(). I think the issue above comes from the cvx package and not from the algorithm or implementation itself. The best guess i can give to the core of the problem is that since the time series i was dealing had pretty large values, the cvx solver ran into numerical instability when dealing with these values. Funnily enough, in the RobustSTL paper(the original one), the authors also scaled their data to be in [0, 1] or used a log transform. Would be useful to know if they ran into the same issues or not since they also used cvx...

I also ran into issues of the algorithm performing poorly. You can see this not only from the experiment in the paper but also from the colab notebook link given in this repo. The magnitudes of the estimated seasonal components do not match with the magnitudes of the synthetically generated true seasonal components at all. Granted, this is probably due to the paper not explaining things very well. For ex., the fast RobustSTL paper does not explain how any of the denoising and regularization parameters were chosen, and like the repo author mentioned, did not provide enough details for the GADMM portion to be replicated...

szsb26 avatar Jun 16 '21 18:06 szsb26

@szsb26 Thank you very much for your reply and help. The problem that existed before after dividing the data by the average can be solved. I have tried many times to adjust the parameters, but the results of the decomposition are still not good. Anyway, thank you for your help!

xhd1203 avatar Jun 17 '21 13:06 xhd1203