leafminer
leafminer copied to clipboard
ESP8266 Crashes at seemingly random intervals after mining starts
After a short period of time the ESP8266 crashes and restarts. Mine on an pool and wait a few minutes to reproduce the issue.
I see that shares are being submitted and can verify that those sahes are making it through my local stratum proxy to the pool.
Here is 10 minutes of run time from PlatformIO serial monitor with decoder. I am running a local stratum proxy and I definitely see valid shares being submitted to the pool.
Could this be an issue with parsing a network response? Each time it fails it seems to be on network_listen() at src\network/network.cpp:416
I have experienced this with v11 and v12.
@matteocrippa Avez vous une possibilité pour cela?
nope, will try during a night this week
Rien d'urgent pour moi... je voudrais pouvoir aider... Mais mon domaine est plus le python...
Le mar. 19 mars 2024, 22:42, Matteo Crippa @.***> a écrit :
nope, will try during a night this week
— Reply to this email directly, view it on GitHub https://github.com/matteocrippa/leafminer/issues/24#issuecomment-2008183433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENQRJQ5N7KHGXTAJTUUSJ3YZCWMZAVCNFSM6AAAAABE2ZRYR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGE4DGNBTGM . You are receiving this because you commented.Message ID: @.***>
I know why the crash is occurring.
I ran the miner with a stratum proxy and was watching the screen. Whenever the proxy notified about a new job available:
2024-03-20 14:03:41,301 INFO proxy client_service.handle_event # New job 201819d for prevhash 78501535, clean_jobs=False
The miner crashes and restarts.
This is why it seems random, new notifications come all the time.
@wmikrut comment pensez vous que l'on pourrait résoudre cela? how do you think we could resolve this?
Good catch, so probably is going out of memory when it calculates the coinbase
#ifndef UNIT_TEST
#include <Arduino.h>
#if defined(ESP32)
#include "freertos/task.h"
#endif // ESP32
#include "leafminer.h"
#include "utils/log.h"
#include "model/configuration.h"
#include "network/network.h"
#include "network/accesspoint.h"
#include "utils/blink.h"
#include "miner/miner.h"
#include "current.h"
#include "utils/button.h"
#include "storage/storage.h"
#include "network/autoupdate.h"
#include "massdeploy.h"
#if defined(HAS_LCD)
#include "screen/screen.h"
#endif // HAS_LCD
char TAG_MAIN[] = "Main";
Configuration configuration = Configuration();
bool force_ap = false;
void setup()
{
Serial.begin(115200);
delay(3000); // augmentation pour test
l_info(TAG_MAIN, "LeafMiner - v.%s - (C: %d)", _VERSION, CORE);
#if defined(ESP8266)
l_info(TAG_MAIN, "ESP8266 - Disable WDT");
ESP.wdtDisable();
*((volatile uint32_t *)0x60000900) &= ~(1);
#else
l_info(TAG_MAIN, "ESP32 - Disable WDT");
disableCore0WDT();
#endif // ESP8266
storage_setup();
force_ap = button_setup();
storage_load(&configuration);
if (configuration.wifi_ssid == "" || force_ap)
{
#if defined(MASS_WIFI_SSID) && defined(MASS_WIFI_PASS) && defined(MASS_POOL_URL) && defined(MASS_POOL_PASSWORD) && defined(MASS_POOL_PORT) && defined(MASS_WALLET)
configuration.wifi_ssid = MASS_WIFI_SSID;
configuration.wifi_password = MASS_WIFI_PASS;
configuration.pool_url = MASS_POOL_URL;
configuration.pool_password = MASS_POOL_PASSWORD;
configuration.pool_port = MASS_POOL_PORT;
configuration.wallet_address = MASS_WALLET;
#else
accesspoint_setup();
return;
#endif // MASS_WIFI_SSID && MASS_WIFI_PASS && MASS_SERVER_DOMAIN && MASS_SERVER_PASSWORD && MASS_WALLET
}
#if !defined(HAS_LCD)
Blink::getInstance().setup();
delay(500);
Blink::getInstance().blink(BLINK_START);
#else
screen_setup();
#endif // HAS_LCD
autoupdate();
if (network_getJob() == -1)
{
l_error(TAG_MAIN, "Failed to connect to network");
l_info(TAG_MAIN, "Fallback to AP mode");
force_ap = true;
accesspoint_setup();
return;
}
#if defined(ESP32)
btStop();
xTaskCreate(currentTaskFunction, "stale", 1024, NULL, 1, NULL);
xTaskCreate(buttonTaskFunction, "button", 1024, NULL, 2, NULL);
xTaskCreate(mineTaskFunction, "miner0", 6000, (void *)0, 10, NULL);
xTaskCreate(networkTaskFunction, "network", 10000, NULL, 3, NULL);
#if CORE == 2
xTaskCreate(mineTaskFunction, "miner1", 6000, (void *)1, 11, NULL);
#endif
#endif
#if defined(ESP8266)
network_listen();
#endif
}
void loop()
{
if (configuration.wifi_ssid == "" || force_ap)
{
accesspoint_loop();
return;
}
#if defined(ESP8266)
miner(0);
#endif // ESP8266
}
#endif
for test, increase of the delay at line 32 in the main.cpp file
I'm not sure it's an out of memory condition. I loaded your Job context with breakpoints and the whole process completes every time. I don't think its a WDT issue because WDT was disabled in main... I re-enabled it and saw the same result.
Could it be as simple as a stack overflow? job.cpp.txt esp8266.log
Et la version 10 fonctionnait? And version 10 worked?
I compiled all the way back to v5. There is no v6 Source.
v5 does not crash on new work notifications. v7 and on does crash on new work notifications.
@wmikrut and it seems that it updates automatically?
I've been running it for 15 minutes now with no crashes. However, there have been importand fixes in later releases.
I'll keep digging and see if I can spot where it's binding up.
OK top, Can you send me the functional version in bin without automatic update in the meantime? in *.bin format?
Just pushed a few changes, quick tested and seems way more stable for me for a Weimos D1, give a try to 0.0.13 (or just way the autoupdate forcing a reboot)
Greeaaattt
It is definitely more stable and now I think I can see a potential memory issue that would be easy to fix.
Every so often the stratum server comes along and assigns a new job and parm 9 will be clean jobs false. From what I gather it means that the server is telling the miner - "Finish your nonces then start on this new job." Mainly so you're not waiting for work.
Under normal circumstances this is fine because hardware miners are quick and jump from job to job.
Now the ESP8266 is much slower and can't handle a lot of queued work. I've see after 3-6 notifications of work being queued the chip freezes up completely.
I've added some quick code to say when work is already queued up, skip queuing up additional jobs. Until Clean Jobs True comes down, we can can working on the current job until we run out of nonces and start on the next job.
When true comes down reset everything and start over.
Definitely much more stable with the limiting of queued jobs. The program is no longer dumping or freezing up.
After an hour it was still running.
It stopped only because by wifi router is garbage and the connection dropped. This is not an item related to this thread, just an FYI.
The program picked it up and reconnected, but it never re-subscribed or authorized the new connection so I kept getting the Connection is not subscribed error.
A future item - subscribe(), authorize(), difficulty() on a new connection. Or, on second thought, perhaps a full reset of the chip would take care of the issue IE: ESP.restart()
I forked the project so you can look at some of the code I am playing with on my dev branch.
I prepared a branch v/0.0.14, but didn't had time to test it yet with some changes according to the feedback you shared.
In general any 1 core device will skip enqueuing any extra job and just replace the current on clean_jobs is true.
Also add logic to try to force reset the session if connection is failing
What caught my attention was line 83 in current.cpp current_job_next = new Job(notification, *current_subscribe, current_difficulty);
This was firing every time new work was sent down from the proxy with clean jobs = false. There are times I see a clean jobs false come down 30-40x before a true comes down.
It was only a guess that perhaps the new Job was somehow allocating memory and/or stack space. I don't know for sure... but I do know what once I stopped the new Job from running on every notification it stopped freezing up.
We can close this issue. I have moved on to v14 for testing.
I've been running v14 for 30 minutes now. Not a single error!
21:49:24.498 > [I] Miner: [0] > [247b563] > 0x00188f0f - diff 0.000043553264 21:49:24.498 > [I] Network: >>> {"id":232,"method":"mining.submit","params":["wmikrut.ex1","247b563","745449","65fe4245","00188f0f"]} 21:49:24.514 > 21:49:24.639 > [I] Network: <<< [mining.submit] {"error": null, "id": 232, "result": true} 21:49:24.639 > [I] Network: Share accepted 21:49:24.639 > [I] Current: Hash accepted: 200
That's a lot of work for the 8266! Can't wait to let it run overnight.
On Fri, Mar 22, 2024 at 4:41 PM Matteo Crippa @.***> wrote:
I prepared a branch v/0.0.14, but didn't had time to test it yet with some changes according to the feedback you shared. In general any 1 core device will skip enqueuing any extra job and just replace the current on clean_jobs is true. Also add logic to try to force reset the session if connection is failing
— Reply to this email directly, view it on GitHub https://github.com/matteocrippa/leafminer/issues/24#issuecomment-2015970121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALJJSQ2UQKE5CZTGWMYJRLYZSQPZAVCNFSM6AAAAABE2ZRYR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJVHE3TAMJSGE . You are receiving this because you were mentioned.Message ID: @.***>
Can we close this?
Pas encore svp
Le sam. 30 mars 2024, 22:16, Matteo Crippa @.***> a écrit :
Can we close this?
— Reply to this email directly, view it on GitHub https://github.com/matteocrippa/leafminer/issues/24#issuecomment-2028470545, or unsubscribe https://github.com/notifications/unsubscribe-auth/AENQRJXXRE5MAS5DYZLUWY3Y24MQ3AVCNFSM6AAAAABE2ZRYR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYGQ3TANJUGU . You are receiving this because you commented.Message ID: @.***>
I tried to flash ESP8266MOD with LeafMiner 0.0.13 and it was not stable. It mined for several minutes and then it stopped to mine. When it was mining and I was watching serial console then I noticed that miner often connects to WiFi, like it loses signal or it was disconnected. From my point of view, this issue was not addressed in LeafMiner 0.0.13 and should not be closed. Or there is other issue with ESP8266MOD, it it just not stable miner...
I flashed firmware in this way:
esptool.py --baud 115200 write_flash 0x0000 firmware_esp8266.bin
Hello I have just flashed a ESP8266 v3 (NodeMCU) and I am not sure this issue I am watching. If not related I can open a separate issue. After flashing it with 0.0.15 and configured it with the SSID, password and wallet address the one thing I've changed was to not show anything on the LCD as this unit had none.
After rebooting it connects and starts mining. I can see shares being submited to vkbit.com, however if I try to simply ping its IP address it responds intermittently, not sure if it is disconnecting and reconnecting the network. Also I am unable to access the Web interface pointing the browser to its IP address.
@ffrediani it's correct, web gui is available only at the first boot or if you erase the flash, keeping a web ui up and running will kill the already limited performances of an esp8266. The behaviour of not being pingable is somehow correct too because we retain a connection, but it's more an open tcp+stratum to the server, as you mentioned you can see it running on vbkit.
If you want there's a branch v0.0.16 with some patches, but it's still a work in progress
@matteocrippa thanks. Isn't there any way to prioritize the processes (the one runs the web interface over the mining one) when required ? So it could decrease the hashrate as necessary in order to reply to the web client for a short period while it is being accessed ?
In any way, even not pinging it or trying to access the web interface I can see that it disconnects and re-connects to the Wifi Access Point every in a while. Is this expected ? If so, if this is happening while it is solving a block it may not be able to submit a share.
Yeah I can try the 0.0.16, however I was not able to find the .bin in Github and can't compile it myself. Any particular URL I can download from ?
Yeah, seems that after a while mining the ESP8266 disconnects or crashes and doesn't recover anymore until rebooted. Do you think 0.0.16 would have any chance to avoid this behavior ?
0.0.16 has some patches in that direction but didn't have time to fully test it, for sure it's more stable than 0.0.15 for esp8266. You can install only building manually for now, till it's released via CI.