Powerwall-Dashboard
Powerwall-Dashboard copied to clipboard
User Token Not Found Error
Problem Every few hours I get Grafana log error and an alert on the Dashboard
logger=context traceID=00000000000000000000000000000000 t=2023-07-03T17:01:14.560304748Z level=error msg="Failed to look up user based on cookie" error="user token not found"
On the Dashboard an alert popup says Unknown Error but it clears after a few seconds.
Any thoughts on what might be causing this? I looked at the other containers for the dashboard and did not notice any other errors at the same times.
Update: I just saw this error on the % Solar Powered Meter line and it's currently displaying N/A
Unexpected token 'L', "Logged in" is not valid JSON
Not sure if this is related to #302 but the solar is producing 4 kW at the time.
Cleared when I set the timer to 4s.
Unexpected token 'L', "Logged in" is not valid JSON
I see this exact same error as well occasionally.
It happens intermittently/randomly and not on any specific Grafana panel/element - it could be any of them.
Usually just hitting refresh will clear it, or it will clear on the next auto-refresh (default 5m period).
I would love to know what is causing this... if anyone has any ideas or thoughts on how to investigate the cause, that would be great. Due to the intermittent nature of the problem I haven't been able to work it out, or bother investing much time in it to be honest.
I do wonder if it is some sort of issue with Grafana sending too many simultaneous query requests to the InfluxDB host for the server to be able to handle sometimes?
Thanks for confirming its not just my system. I was worried it was a portent for something worse. It does clear quickly and does not seem to have any lasting impact.
Interesting. I haven't seen this. A few questions to see if we can build some connection to cause...
- Are you using the latest dashboard.json and have you seen this before the recent update? I want to rule out any recent changes.
- What is the Dashboard stack (grafana, influx) running on (e.g. I have it on a RPi4 and a Ubuntu 20 host)?
- What browser are you using (e.g. I'm using Chrome)?
It may be interesting to see if there are any errors in the grafana or influxdb logs or what the dev console says in the browser.
I am still using the previous dashboard.json
My stack is the infamous QNAP NAS with everything on the same instance
Using Chrome Version 114.0.5735.199 (Official Build) (64-bit) on Windows 10
I wiped my system and went to a Git install vs doing it by hand.
I am getting these messages in pypowerwall every so often. Maybe this gives more depth? The same message in Grafana as above seems to correlate to it but hard to be sure as there are no timestamps in pypowerwall log.
Exception occurred during processing of request from ('172.29.0.4', 36808) Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/pypowerwall/init.py", line 155, in _get_session self.auth = {'AuthCookie': r.cookies['AuthCookie'], 'UserRecord': r.cookies['UserRecord']} File "/usr/local/lib/python3.10/site-packages/requests/cookies.py", line 334, in getitem return self._find_no_duplicates(name) File "/usr/local/lib/python3.10/site-packages/requests/cookies.py", line 413, in _find_no_duplicates raise KeyError(f"name={name!r}, domain={domain!r}, path={path!r}") KeyError: "name='AuthCookie', domain=None, path=None"
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.10/socketserver.py", line 683, in process_request_thread self.finish_request(request, client_address) File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/local/lib/python3.10/socketserver.py", line 747, in init self.handle() File "/usr/local/lib/python3.10/http/server.py", line 433, in handle self.handle_one_request() File "/usr/local/lib/python3.10/http/server.py", line 421, in handle_one_request method() File "/app/server.py", line 232, in do_GET vitals = pw.vitals() File "/usr/local/lib/python3.10/site-packages/pypowerwall/init.py", line 308, in vitals stream = self.poll('/api/devices/vitals') File "/usr/local/lib/python3.10/site-packages/pypowerwall/init.py", line 233, in poll self._get_session() File "/usr/local/lib/python3.10/site-packages/pypowerwall/init.py", line 164, in _get_session raise LoginError("Invalid Powerwall Login") pypowerwall.LoginError: Invalid Powerwall Login
pypowerwall.LoginError: Invalid Powerwall Login
This is the cause of the stack trace. The pypowerwall library will raise this exception to indicate that login to the Powerwall failed, but it will continue (allowing the proxy to retry and eventually reconnect). I sometimes get network related errors but they look like this:
07/07/2023 03:21:04 PM [proxy] [ERROR] Socket broken sending response [doGET]
07/07/2023 03:21:04 PM [proxy] [ERROR] Socket broken sending response [doGET]
07/07/2023 03:21:04 PM [proxy] [ERROR] Socket broken sending response [doGET]
However, I haven't seen it impact login to the Powerwall. Is your Powerwall connected via WiFi or hardware (ethernet)? I recall early on that I would see long periods of network errors and data gaps (up to 10m at a time) when I had the Powerwall connected via Wifi. I hardwired it into my network switch and haven't seen that since.
The "login" errors could be due to too many connections to the Powerwall. One option to try would be to decrease the pypowerwall PW_POOL_MAXSIZE
value from 15 to 10. This is done by appending this to the pypowerwall.env
file:
# append setting to pypowerwall env file
cat "PW_POOL_MAXSIZE=10" >> pypowerwall.env
cat pypowerwall.env
# restart
docker stop pypowerwall
docker rm pypowerwall
./compose-dash.sh up -d
You can tweak that number to see what works best. Watch the pypowerwall logs to see what happens. Other settings like PW_CACHE_EXPIRE
(defaults to 5 -- five seconds) could be adjusted (increased) as well to reduce load on the Powerwall.
EDIT
I had another thought. Check the logs for telegraf to see if there are any errors there as well. Telegraf is what polls pypowerwall (to fetch from Powerwall) to write the data in InfluxDB.
My PW was connect via WiFi and that was a complete disaster. But I cannot connect it directly to ethernet so I had to settle for a wireless extender to ethernet. Works very well know but this could have exposed the fact that timing could be an issue. I'll try changing the PW_POOL_MAXSIZE parm.
Truthfully is NBD as it does self correct very quickly. Really just wanted you to be aware of it.
I'll check the Telegraf logs next time I see it but looking at the log now there was nothing post last startup 24 hours ago.
I noticed today that my NAS was basically locked up for a few minutes after one of these errors. I could not get into the Admin panel. After a few minutes it cleared and all was ok.
In the Telegraph logs I saw these:
2023-07-22T19:03:25Z W! [inputs.http] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:03:29Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval 2023-07-22T19:03:48Z W! [inputs.kernel] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:03:48Z W! [inputs.http] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:03:48Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/freq]: Get "http://pypowerwall:8675/freq": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2023-07-22T19:03:48Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/aggregates]: Get "http://pypowerwall:8675/aggregates": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2023-07-22T19:03:48Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/temps/pw]: Get "http://pypowerwall:8675/temps/pw": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2023-07-22T19:03:48Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/strings]: Get "http://pypowerwall:8675/strings": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2023-07-22T19:03:48Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/pod]: Get "http://pypowerwall:8675/pod": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2023-07-22T19:03:48Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/alerts/pw]: Get "http://pypowerwall:8675/alerts/pw": dial tcp 172.29.4.2:8675: i/o timeout (Client.Timeout exceeded while awaiting headers) 2023-07-22T19:03:48Z E! [inputs.http] Error in plugin: [url=http://pypowerwall:8675/soe]: Get "http://pypowerwall:8675/soe": dial tcp 172.29.4.2:8675: i/o timeout (Client.Timeout exceeded while awaiting headers) 2023-07-22T19:03:48Z W! [inputs.http] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:03:48Z W! [inputs.system] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:03:48Z W! [inputs.mem] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:03:48Z W! [inputs.disk] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:03:48Z E! [outputs.influxdb] When writing to [http://influxdb:8086]: failed doing req: Post "http://influxdb:8086/write?db=powerwall&rp=raw": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2023-07-22T19:05:02Z W! [inputs.swap] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.mem] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.processes] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval 2023-07-22T19:05:02Z W! [agent] ["outputs.influxdb"] did not complete within its flush interval 2023-07-22T19:05:02Z W! [inputs.diskio] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.diskio] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.http] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.http] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.disk] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.disk] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.cpu] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.system] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.system] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z W! [inputs.http] Collection took longer than expected; not complete after interval of 5s 2023-07-22T19:05:02Z E! [agent] Error writing to outputs.influxdb: could not write any address