OpenBK7231T_App icon indicating copy to clipboard operation
OpenBK7231T_App copied to clipboard

Enhancement: add support for W600

Open iprak opened this issue 2 years ago • 34 comments

W600 is the older and slower brother of W800 from the same company Winner Micro. It bears resemblance to W800 but has its own SDK. I have got it somewhat working but have to resolve some crashes probably related to older SDK.

@openshwprojects would this be something which could be merged into this repo ? Have you received any request for W600? Would it complicate the overall build?

iprak avatar Sep 28 '22 13:09 iprak

I have W600 board, can you share your progress so I can look into it?

EDIT: W600 board from Aliexpress, the dev one.

Do W600 also have an IDE with editor and compiler built in like W800 or how does one compile for W600?

openshwprojects avatar Sep 28 '22 13:09 openshwprojects

Well since now we have 2 potential users so my effort might be of use. :-)

I dismantled a switch hoping to find BK but found W600. There is a separate older toolchain which I have been using primarily from cygwin.

I based my effort on the W800 repo. I am not certain but I think that the heapsize increase and configUSE_HEAP2 usage are not playing well, this is a much older sdk.

I can submit a PR this week once I have verified the issues. I will also double check the linux toolchain.

iprak avatar Sep 28 '22 13:09 iprak

Your efforts are of course not in vain, you already contributed a lot of to OpenBeken, you're one of our best contributors.

First of all, what do you mean by "older SDK", do you mean that there is a newer version for W600 or are you calling W600 "older" and W800 "newer"?

What is exactly crashing? I had some crash issues on BL602 platform and I suspect that all those chip toolchains might have LWIP library not prepared for threading, thus causing random crashes, but I am not sure right now. Remember that there is also a BK7231N stability issue reported by some users.

The HEAP2 should not be problematic if we are not doing a malloc too often.

One of things I've considered doing is removing mallocs per HTTP request from here: https://github.com/openshwprojects/OpenBK7231T_App/blob/main/src/httpserver/http_tcp_server.c

reply = (char*) os_malloc( replyBufferSize ); buf = (char*) os_malloc( INCOMING_BUFFER_SIZE );

the thing is, if I try to process one client at once (without creating threads at all, in a blocking manner) BK web page is not responsive and very sluggish.

Still, there is another potential approach to this problem.

Instead of doing malloc per each HTTP request and doing free, one could do something like that:

struct buffers { char *a, b; bool bInUse; buffers *next; };

  1. when HTTP request is added: buffers *HTTP_RegisterBuffers( ) ; this would find empty buffers or alloc new ones
  2. when HTTP request is processed HTTP_MarkBufferesAsFree(buffers*) this would just set bInUse = false so they are reused

this will remove all malloc/free calls, of course one would have to limit the total number of buffers allocated at once

I might do this change tomorrow, it's very simple to do and implement, and this might help systems that don't like frequent malloc/free

openshwprojects avatar Sep 28 '22 15:09 openshwprojects

Frequent use of malloc/free operations are pretty safe, unless bugs in code write data outside of allocated space and overwrite control data area of allocated memory blocks. Which is in 99% of cases the cause of crash, when system is using malloc/free.

Especially example of HTTP server, when constructing response, which is text string, where reserve must be considered. printing number may differ in length of text string. Especially float, 32-bit decimal,...... Creating new approach may result in same issues unless bugs are completely fixed.

valeklubomir avatar Sep 28 '22 16:09 valeklubomir

@valeklubomir are you aware about W800 and possibly W600 limitation of older RTOS using older Heap management (free/malloc) algorithm?

https://cdmana.com/2022/03/202203310754244532.html

W801 Of SDK Is added by default heap.2 Algorithm , The algorithm has no defragmentation function , It is not suitable for applications that need to frequently apply for and release memory of different sizes , For example, the player made by the author , Different decoding algorithms will be selected due to different formats, and different sizes of memory will be applied , In actual use, it will be unable to run due to serious memory fragmentation and reduced utilization .

openshwprojects avatar Sep 28 '22 16:09 openshwprojects

@openshwprojects I apologize, I may overstepped little. Sorry for that, I am really not familiar W600, W800. And I am not aware of this limitation.

My concerns were about my current work with device with BK7231N and how these changes would impact it. Because I am trying to resolve issues with freezing on my device. But your idea with buffers for HTTP brought me on some track, I experienced similar behavior on ESP32 device ESP-IDF SDK, which uses also FreeRTOS and LWIP, and HTTP server caused crashes which I later traced to buffer capacity problems. I resolved it with larger buffer, especially functionality which increased size of the buffer when risk of overflow was detected. I rejected splitting response into multiple transmission on HTTP level, because then web page was not responsive and very sluggish. Option when buffer was sent at once and it was splited on TCP level, kept the HTTP with good response rates.

valeklubomir avatar Sep 28 '22 17:09 valeklubomir

Because I am trying to resolve issues with freezing on my device.

does it happen with MQTT disabled?

I have a slight suspicion that it may related to multithreading, but not sure, because it happens on N SDK for people and not on T. Anyway, here are my two ideas what can be done to fix N platform stability:

  1. compare BK7231N SDK step by step to BK7231T SDK (#define etc etc). I tried to do that in the past, but no luck so far
  2. maybe, if it's really threading issue, then one could try updating LWIP to more safe version and add mutexes there? See: https://www.nongnu.org/lwip/2_1_x/multithreading.html our lwip is 2.0.2 if I remember correctly, so it has no LWIP_ASSERT_CORE_LOCKED . Maybe one could try updating LWIP..

@valeklubomir do you know C, would you be able to help?

openshwprojects avatar Sep 28 '22 17:09 openshwprojects

@openshwprojects I did not try it with MQTT disabled. Will let it run overnight to check. I am working on it since 5 days, private project. At beginning I experienced freezing directly after reboot, due to wrong configuration setup. And since only safe mode did not freeze, deleted configuration and restarted the process without any more issues.

I know C and I am trying to fix it.

I can try to compare SDK. But It could be helpfull to know differences between BK7231N and T. Is there any datasheet available? I had not luck finding one till now. I could compare LWIP and try upgrade to latest version.

At moment I work on MQTT to improve stability and handing error states.

valeklubomir avatar Sep 28 '22 18:09 valeklubomir

@valeklubomir I remember that doing a MQTT publish every second or several publishes of MQTT every second tends to trigger the issue more often. I've tried to debug it, but no luck. I also tried increasing the size of buffers in LWIP for TCP sockets, the PBuf size, etc, but the issue still persists. I don't think it is related to the actual size of LWIP buffers, I'd rather say it's related to threading or something. Or maybe I just missed something.

No datasheet as far as I know.

The strange thing is that it happens on N platform and not on T, while LWIP is the same on both platforms, soo maybe my suspicion about threading and LWIP is wrong.

openshwprojects avatar Sep 29 '22 10:09 openshwprojects

I think the discussion about MQTT and buffer might belong to a different thread. :-)

Any way here is my status update on W600 -> I am currently preparing a PR.

At one point I was getting errors like this on startup but not any more.. I did a full erase of the device using a tool which came with SDK and maybe that cleared up some setting from older firmware. I was unable to read back the stock firmware before I started this experimentation.

Current Stack [0x2002a718, 0x2002aa38) is NOT in VALID STACK range [0x20000000,0x20028000)
Please refer to APIs' manul and modify task stack position!!!

Current Stack [0x2002b710, 0x2002bbc0) is NOT in VALID STACK range [0x20000000,0x20028000)
Please refer to APIs' manul and modify task stack position!!!
  • I have the firmware built on latest OpenBK7231T_App.
  • WiFi has been stable, it ran overnight. MQTT has been stable too with Flag 2 enabled (to send updates every second).
  • What I have is a de-soldered W600 module and so I haven't figured out what IO pins to use so I have not tested MQTT message reception.
  • Latest log indicates far smaller free heap but that might just what RTOS reports .. my Bk7231T reports free value of 103256 with similar setup. I do have the recent heap related changes in FreeRTOSConfig.h
Info:MAIN:Time 12844, free 30296, MQTT 1, bWifi 1, secondsWithNoPing -1, socks 2/8
Info:MAIN:Time 12845, free 30296, MQTT 1, bWifi 1, secondsWithNoPing -1, socks 2/8
Info:MAIN:Time 12846, free 30296, MQTT 1, bWifi 1, secondsWithNoPing -1, socks 2/8
Info:MAIN:Time 12847, free 30296, MQTT 1, bWifi 1, secondsWithNoPing -1, socks 2/8

Issues:

  • I did notice some code issues in how wifi was being set .. mainly callback is set after making the connection but the demo does it other way around and that makes sense to me. I will adjust that for W600 and if that is okay, then do the same for W800 separately.
  • Wifi callback processing is also slightly incorrect and generates incorrect log.
  • The demo also were setting some extra flags which seemed crucial because I noted my device to start in both st and ap modes. Following the demo pattern resolved that too.
please wait connect net......
Info:MAIN:Time 5, free 5912, MQTT 0, bWifi 0, secondsWithNoPing 0, socks 2/8
Debug:MAIN:Registered for wifi changes<CR>

apsta_demo_net_status: sta ip: 192.168.1.111

apsta_demo_net_status: softap ip: 192.168.4.1
Info:MAIN:Time 6, free 3344, MQTT 0, bWifi 0, secondsWithNoPing 0, socks 3/8
Info:MAIN:Time 7, free 3344, MQTT 0, bWifi 0, secondsWithNoPing 0, socks 3/8

iprak avatar Sep 29 '22 10:09 iprak

I will post my finding here. https://github.com/openshwprojects/OpenBK7231T_App/issues/204

valeklubomir avatar Sep 29 '22 10:09 valeklubomir

@iprak great progress, I will try to find where I have put my W600 dev board in a meantime, I will help with testing when your port is released Can you also do a detailed write up how to compile, and also, is there also an IDE for W600? Or just a Cygwin prompt

@iprak I'm ready

20220929_164301518_iOS

openshwprojects avatar Sep 29 '22 15:09 openshwprojects

That is a nice dongle and it has a reset button.

I pushed my changes in https://github.com/openshwprojects/OpenBK7231T_App/pull/229

I was unable to get it compiling all the way in linux. The very last step of generating fls file is done through wm_tool and the sdk only contains wm-tool.exe.

I did find another sdk at https://docs.wiznet.io/Product/Wi-Fi-Module/WizFi360/Other-Resource/w600_sdk which contains python based image generation.

iprak avatar Sep 30 '22 01:09 iprak

@iprak regarding the USER_SW_VER. The default value of USER_SW_VER is set in code, right, but the correct value for online builds should be set in build scripts. Refer to already supported platforms like BK7231 to see how it's set. It also seems that the setting of USER_SW_VER is missing for BL602 as well. I will try to build for W600 tomorrow.

openshwprojects avatar Oct 01 '22 19:10 openshwprojects

You are absolutely right. I adjusted the W600 sdk to accept version as something like this make -C OpenBK7231T_App/sdk/OpenW600 TOOL_CHAIN_PATH=/workspaces/OpenBK7231T_Dev/w600-gcc-arm/bin/ APP_VERSION=1.2.3.

I poked at W800 SDK and could not figure out if/how it passed down the version. I saw this in the action log but that's how far I got. I don't have a W800 device to experiment with.

2022-10-01T18:16:42.5557137Z ##[group]Run make APP_VERSION=1.12.67 APP_NAME=OpenW800 OpenW800
2022-10-01T18:16:42.5557474Z [36;1mmake APP_VERSION=1.12.67 APP_NAME=OpenW800 OpenW800[0m
2022-10-01T18:16:42.5609260Z shell: /usr/bin/bash -e {0}

iprak avatar Oct 01 '22 20:10 iprak

W800 is just 4$, I can buy you one from eBay if you want, you'd just need to message me on Elektroda, of course eBay would need to ship to your country image

I will look into W600 today, remind me if I forget, right now I am adding an option to cancel repeating events, I will add this to main source tree in few hours

openshwprojects avatar Oct 02 '22 06:10 openshwprojects

Thank :-) I got one.

iprak avatar Oct 02 '22 11:10 iprak

@iprak can you temporarily look into HA discovery for RGBCW lights? I have added a valid config generation: https://github.com/openshwprojects/OpenBK7231T_App/commit/2284d0ec97c428eb5422b627211c680f81ff8392 but discovery still discovers my RGBCW bulb as 5 PWMs

openshwprojects avatar Oct 02 '22 15:10 openshwprojects

That would be expected, there is currently no support for color lights. I was working on adding support for voltage and can look into that next.

iprak avatar Oct 02 '22 19:10 iprak

Didn't have time to play with W800/W600 yet. What is the current state of things, @iprak ? Is there anything you need help with? In a meantime, we're fixing the N stability and some LED stuff

openshwprojects avatar Oct 04 '22 04:10 openshwprojects

I have made good progress. Through trail/error and logging, I was able to isolate the potential root causes - storage overflow and missing NULL check. This was in JSON generation/cleanup and was not associated with MQTT publish.

I did not use the latest lwip but did increase MQTT_OUTPUT_RINGBUF_SIZE, etc. which would have eventually caused MQTT publish to fail. I am also switching to fixed size storage array since I am suspicious about the memory business in W600. I am going to let my device run today with status broadcast every minute and then push out changes.

Also testing the HASS changes in a T device.

iprak avatar Oct 04 '22 11:10 iprak

Good job. @iprak , is w600 OTA ready? I have W600 RGBCW bulb! W600 OTA would help a lot and I'd be able to test more, especially that I need to get that one running.

openshwprojects avatar Oct 05 '22 14:10 openshwprojects

Oh yes, that is what I have been using. The OTA however is only implemented for W600/W800 in the app served by the device and not by OpenBekenIOT/webapp.

iprak avatar Oct 05 '22 14:10 iprak

@iprak wow that's great, i will really try to setup the SDK for compilation and OTA of W600, if there is anything else I need to know please tell me now, I have just a single W600 bulb and I don't want to brick it. I will test with W600 dev board first.

openshwprojects avatar Oct 05 '22 14:10 openshwprojects

How will you flash the bulb?

I have been working with this chip. I wired the serial pins and was able to flash it. I did have to use full erase the very first time with the UART (fls) image.

image

For OTA update, I have been using the gz.img

iprak avatar Oct 05 '22 14:10 iprak

I will most likely use this method for flashing bulb: https://www.youtube.com/watch?v=7MyfSgxLAOo&ab_channel=elektroda.pl Btw, are the english subtitles working for my video?

Very nice images, remember to post this as a teardown to Elektroda!

Look: image testing begins

openshwprojects avatar Oct 05 '22 16:10 openshwprojects

Yes subtitles are excellent.

iprak avatar Oct 05 '22 16:10 iprak

I connected to this AP some time and after opening the main page I had: image and it froze when I clicked "Config". But now I see that this small WiFi dongle I am using (see previous post for image) is hot, so maybe it's overheating... I will give it 15 minutes to cool down and try again.

EDIT: huh, it didn't disconnect the socket at first? image It disconnected socket after only few seconds... image

buuut now it works ok... so maybe it was a fluke.... thisUSB dongle W600 Chinese module seems very cheap, and it gets untouchable hot in my notebook USB port after a minute of running.

openshwprojects avatar Oct 05 '22 17:10 openshwprojects

I saw something similar at the beginning when I was playing with the SDK demo app, the t-scan test would give nothing. No errors but the access-point test would passed and I was able to connect to it. So I couldn't blame it on bad radio on the chip.

Anyway, I then the full erase option with wm_tools and everything started working.

I had the chip connected to 2 separate dongles, one for 3.3 and 2nd for serial. I am going to let it run and then check the temperature.

I have suspicion of something wrong but can't find evidence. I tend to make a note of how long has the chip been running and don't think the "Online for ..." is correct. That timing is based on every second tick and it feels to drift over time. I have not enabled NTP yet and plan to use that to get some data. If the chip gets busy doing something and gets hot, then that would explain the drifted "Online for ...".

iprak avatar Oct 05 '22 17:10 iprak

@iprak I think we might have a long-ongoing issue with LWIP. It was fixed on N SDK by one contributor, but the same problem was in BL602 and still is not fixed. I don't know about W600 yet.

This little dongle gets really, really hot. I will try to test more, but maybe I will really find some kind of radiator and just stick it with thermal paste to it so it can run longer.

Why is total number of sockets trimmed? image EDIT: changed to WiFi client from OTA and now it's 2/8 displayed correctly? or wait, no, look: image buffer len trimming? image

Naming to fix

openshwprojects avatar Oct 05 '22 17:10 openshwprojects