Problem: Controller crash, self-resets and then skips config file
Controller Board
Root Controller ISO Rev 3 (https://www.rootcnc.com/product/root-controller-rev-3/) running FluidNC v3.7.1.
Machine Description
Gantry diode laser with external DM556 stepper drivers, dual Y motors, endstops switches on all axes, 80W laser module.
Input Circuits
No response
Configuration file
nboard: Root Controler v3.0
name: CNC-1250x1250
stepping:
engine: I2S_STREAM
idle_ms: 255
pulse_us: 4
dir_delay_us: 1
disable_delay_us: 0
axes:
shared_stepper_disable_pin: NO_PIN
x:
steps_per_mm: 800.000
max_rate_mm_per_min: 4000.000
acceleration_mm_per_sec2: 100.000
max_travel_mm: 1280.000
soft_limits: true
homing:
cycle: 2
positive_direction: false
mpos_mm: 0.000
feed_mm_per_min: 300.000
seek_mm_per_min: 3000.000
settle_ms: 100
seek_scaler: 1.100
feed_scaler: 1.100
motor0:
limit_neg_pin: gpio.34
limit_pos_pin: NO_PIN
limit_all_pin: NO_PIN
pulloff_mm: 3.000
standard_stepper:
step_pin: I2SO.7:low
direction_pin: I2SO.5:low
disable_pin: I2SO.3:high
y:
steps_per_mm: 800.000
max_rate_mm_per_min: 4000.000
acceleration_mm_per_sec2: 100.000
max_travel_mm: 1280.000
soft_limits: true
homing:
cycle: 2
positive_direction: false
mpos_mm: 0.000
feed_mm_per_min: 300.000
seek_mm_per_min: 3000.000
settle_ms: 100
seek_scaler: 1.100
feed_scaler: 1.100
motor0:
limit_neg_pin: gpio.32
limit_pos_pin: NO_PIN
limit_all_pin: NO_PIN
pulloff_mm: 3.000
standard_stepper:
step_pin: I2SO.12:low
direction_pin: I2SO.10:high
disable_pin: I2SO.8:high
motor1:
# limit_neg_pin: gpio.26
limit_pos_pin: NO_PIN
limit_all_pin: NO_PIN
pulloff_mm: 3.000
standard_stepper:
step_pin: I2SO.6:low
direction_pin: I2SO.4:high
disable_pin: I2SO.2:high
z:
steps_per_mm: 2000.000
max_rate_mm_per_min: 1000.000
acceleration_mm_per_sec2: 100.000
max_travel_mm: 180.000
soft_limits: true
homing:
cycle: 1
positive_direction: true
mpos_mm: 0.000
feed_mm_per_min: 300.000
seek_mm_per_min: 2000.000
settle_ms: 100
seek_scaler: 1.100
feed_scaler: 1.100
motor0:
limit_neg_pin: NO_PIN
limit_pos_pin: gpio.27
limit_all_pin: NO_PIN
pulloff_mm: 3.000
standard_stepper:
step_pin: I2SO.18:low
direction_pin: I2SO.16:high
disable_pin: I2SO.14:high
i2so:
bck_pin: gpio.22
data_pin: gpio.12
ws_pin: gpio.21
spi:
miso_pin: gpio.19
mosi_pin: gpio.23
sck_pin: gpio.18
sdcard:
card_detect_pin: NO_PIN
cs_pin: gpio.5
# frequency_hz: 1000000
control:
safety_door_pin: gpio.15:low
reset_pin: NO_PIN
# feed_hold_pin: gpio.15:low
cycle_start_pin: NO_PIN
probe:
pin: gpio.2
check_mode_start: false
Laser:
pwm_hz: 5000
output_pin: gpio.33
enable_pin: NO_PIN
disable_with_s0: false
s0_with_disable: true
tool_num: 0
speed_map: 0=0% 1000=100%
off_on_alarm: true
macros:
startup_line0:
startup_line1:
#macro0: $SD/Run=lasertest.gcode
macro1:
macro2:
macro3:
user_outputs:
analog0_pin: NO_PIN
analog1_pin: NO_PIN
analog2_pin: NO_PIN
analog3_pin: NO_PIN
analog0_hz: 5000
analog1_hz: 5000
analog2_hz: 5000
analog3_hz: 5000
digital0_pin: NO_PIN
digital1_pin: NO_PIN
digital2_pin: NO_PIN
digital3_pin: NO_PIN
start:
must_home: true
Startup Messages
Resetting MCU
ets Jul 29 2019 12:21:46
rst:0x1 (POWERON_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:1184
load:0x40078000,len:13220
ho 0 tail 12 room 4
load:0x40080400,len:3028
entry 0x400805e4
[MSG:INFO: FluidNC v3.7.1]
[MSG:INFO: Compiled with ESP32 SDK:v4.4.4]
[MSG:INFO: Local filesystem type is spiffs]
[MSG:INFO: Configuration file:con12.yaml]
[MSG:DBG: Running after-parse tasks]
[MSG:DBG: Checking configuration]
[MSG:INFO: Machine CNC-1250x1250]
[MSG:INFO: Board Root Controler v3.0]
[MSG:INFO: I2SO BCK:gpio.22 WS:gpio.21 DATA:gpio.12]
[MSG:INFO: SPI SCK:gpio.18 MOSI:gpio.23 MISO:gpio.19]
[MSG:INFO: SD Card cs_pin:gpio.5 detect:NO_PIN freq:8000000]
[MSG:INFO: Stepping:I2S_stream Pulse:4us Dsbl Delay:0us Dir Delay:1us Idle Delay:255ms]
[MSG:INFO: Axis count 3]
[MSG:INFO: Axis X (0.000,1280.000)]
[MSG:INFO: Motor0]
[MSG:INFO: standard_stepper Step:I2SO.7:low Dir:I2SO.5:low Disable:I2SO.3]
[MSG:INFO: X Neg Limit gpio.34]
[MSG:DBG: X Neg Limit 0]
[MSG:INFO: Axis Y (0.000,1280.000)]
[MSG:INFO: Motor0]
[MSG:INFO: standard_stepper Step:I2SO.12:low Dir:I2SO.10 Disable:I2SO.8]
[MSG:INFO: Y Neg Limit gpio.32]
[MSG:DBG: Y Neg Limit 0]
[MSG:INFO: Motor1]
[MSG:INFO: standard_stepper Step:I2SO.6:low Dir:I2SO.4 Disable:I2SO.2]
[MSG:INFO: Axis Z (-180.000,0.000)]
[MSG:INFO: Motor0]
[MSG:INFO: standard_stepper Step:I2SO.18:low Dir:I2SO.16 Disable:I2SO.14]
[MSG:INFO: Z Pos Limit gpio.27]
[MSG:DBG: Z Pos Limit 0]
[MSG:INFO: safety_door_pin gpio.15:low]
[MSG:INFO: Kinematic system: Cartesian]
[MSG:INFO: Laser Ena:NO_PIN Out:gpio.33 Freq:5000Hz Period:8191]
[MSG:INFO: Using spindle Laser]
[MSG:INFO: Probe Pin: gpio.2]
[MSG:INFO: Connecting to STA SSID:CNC_Router]
[MSG:INFO: Connecting.]
[MSG:INFO: Connecting..]
[MSG:INFO: Connecting...]
[MSG:INFO: Connected - IP is 192.168.0.201]
[MSG:INFO: WiFi on]
[MSG:INFO: Start mDNS with hostname:http://fluidnc.local/]
[MSG:INFO: SSDP Started]
[MSG:INFO: HTTP started on port 80]
[MSG:INFO: Telnet started on port 23]
Grbl 3.7 [FluidNC v3.7.1 (wifi) '$' for help]
[MSG:INFO: '$H'|'$X' to unlock]
[MSG:DBG: X Neg Limit 0]
[MSG:DBG: Y Neg Limit 0]
[MSG:DBG: Z Pos Limit 0]
User Interface Software
WebUI
What happened?
I am trying to make my new controller (Root Controller ISO Rev 3) work, but when using WebUI the controller very often crashes and I need to restart it several times before it works again. After many days of trial and error I now know that CPU self-resets and then skips the config file. Hence the need for multiple physical resets of the controller. On next start it uses the config again without skipping. But that is as far as I can track the issue. After reading many other issues, I have connected my controller to USB and I can see in the console what is happening during the crash. I don’t know why is this happening, but I cannot finish almost any gcode without crash. When I run the job from the WebUI, it works few seconds and then, at first it stutters, then continues and come to full stop and continues after few seconds. Then finally ten stops completely and restarts. Can somebody please tell what can I do to make the controller stable and reliable? Example gcode file: EngraveTest1.zip Text from the console: fluidterm_log_2023-06-22.txt
Other Information
No response
I think there are memory leaks in the websocket code that is used to send auto reports to WebUI. Try turning off auto-reporting, instead using polling. Meanwhile I will try to resolve the websocket problems.
Thank you for quick reply, I have tried setting auto-reporting to 50/100/500/5000 ms without any major difference. After a while the controller still resets. If I set the reporting to None or Poll 3sec it sometime works. But if I reload the page it is set back to Auto. How can I set the reporting to polling permanently? Also, when I run the job and reload the web page, it causes small pause in motor movement as well. Is that normal behaviour?
The pause when reloading the page is unavoidable because of the work the ESP32 does when responding to the reload request. There are unavoidable tradeoffs to put this much functionality in one low-cost chip.
There is no easy way to permanently switch off auto-reporting. I will add that capability for the next release.
That is very unfortunate. Because I use a smartphone for the WebUI and I am sure that when I switch my screen off and on again, then Firefox will sometimes reload the page by it self (maybe because some power saving?). That in turn means I cannot use the WebUI at all, because it will always reset the auto reporting and possibly ruin the job. I have to use the computer for now I guess. I have read somewhere that the reloading should be blocked during a job, but only happens when force reload the page (ctrl+f5). Dont know how to that on a phone. Should not be the case any time it is reloaded, when it has this critical impact on the controller?
Recently we added the ability to cache the webui code in the browser, which reduces the amount of work that the ESP32 has to do when the page is reloaded, assuming that the browser does the right thing. Even so, it still has to do some amount of work to reestablish connections and stuff, which takes time that might be needed for high-speed stepping.
The caching works reasonably well with Chrome on a PC. I don't know about Firefox and phone browsers. Testing on everything is very time consuming. This is free software, supported by unpaid volunteers, that runs on an enormous number of different configurations, on very inexpensive hardware that we often have no control over.
You get what you pay for. In this case, the required payment is $0, so set your expectations accordingly. We are trying to make FluidNC work well, but the task is huge and the rewards are few.
I do understand it is not easy and I appreciate your work on this free software. I try to make my machine work off a computer, even thou it is very unhealthy for it to be in dusty woodworking workshop. That is why i use the phone in a first place.
Before there is some update, to turn off the autopolling or fix the WebUI memory leak, should I try to use some older version of FluidNC from time before autopolling? When was this feature added?
https://github.com/MitchBradley/ESP3D-WEBUI/blob/revamp/index.html.gz might help. I added the ability to turn off auto-reporting by setting the interval to 0 in preferences. If you set both the auto-report and poll intervals to 0, the status reporting will be set to None.
After you upload a new index.html.gz or change the preferences, reboot the controller to make sure that the browser gets the new versions. There is a bug in the caching code that sometimes fails to recompute the hash value for newly-uploaded files.
Thank you. I will try that as soon as possible and after testing, report my findings.
I think Im having this issue too, but can someone confirm its the same thing? My device is unstable, wonder what can be done. Loosing works in progress sucks when the machine just panics. It makes this free software cost a lot. I'll happily donate some money so I stop loosing projects mid job.
Layer Design Guru Meditation Error: Core 0 panic'ed (InstrFetchProhibited). Exception was unhandled. Core 0 register dump: PC : 0x00000000 PS : 0x00060030 A0 : 0x800f33d2 A1 : 0x3ffe7560 A2 : 0x3ffcb628 A3 : 0x3ffcb4d5 A4 : 0x00000000 A5 : 0x00000000 A6 : 0x3ffcab80 A7 : 0x00000000 A8 : 0x800f3270 A9 : 0x3ffe7540 A10 : 0x3ffee2a4 A11 : 0x3ffee2a4 A12 : 0x3ffbc170 A13 : 0x00000000 A14 : 0x00000000 A15 : 0x00060023 SAR : 0x0000001a EXCCAUSE: 0x00000014 EXCVADDR: 0x00000000 LBEG : 0x4008b6c4 LEND : 0x4008b6da LCOUNT : 0xffffffff Backtrace: 0xfffffffd:0x3ffe7560 0x400f33cf:0x3ffe7590 0x400f0707:0x3ffe75b0 ELF file SHA256: c0232e7857604a85 Rebooting... ets Jul 29 2019 12:21:46 rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:1 load:0x3fff0030,len:1184 load:0x40078000,len:13220 ho 0 tail 12 room 4 load:0x40080400,len:3028 entry 0x400805e4 [MSG:ERR: Skipping configuration file due to panic] Grbl 3.7 [FluidNC v3.7.1 (wifi) '$' for help]
@anon65453
Please start your own problem issue, so we can see your setup
Sorry for the delay.
I have uploaded the new index.html.gz and did many tests to try come up with some useful information.
I had big trouble figuring out why sometime the WebUI work and sometime does not. I think it has to do something with the multiple connections happening on a same time. For example: When I connect only one device (console shows [MSG:DBG: WebSocket 0 from 192.168.0.100 uri /]), everything works reasonably well, until the page is refreshed or worse another device is connected (WebSocket 1). If two devices are connected on same time, the second one can send commands, but does not receive any replies ("Commands window" is not updated).
I had before used ESP3D (predecessor to FluidNC) on my previous controller and in the same situation the first device was always disconnected and some message about needing to reload the page was displayed in order to regain control.
What is the intended process in FluidNC in this situation? Should it be possible to view the WebUI in more one device at all? Is it possible that this somehow causes the memory leak?
I wasn’t able to make the "v3.7.2-pre1" to work at all, after flashing the firmware the web page only shows blank page with row of word "Firmware""Interface""Help" and lock icon. I cannot do anything else and when the this almost empty page loads, the console always stop responding. Only thing I can do is ctrl+r to reset it.
I think Im having this issue too, but can someone confirm its the same thing? My device is unstable, wonder what can be done. Loosing works in progress sucks when the machine just panics. It makes this free software cost a lot. I'll happily donate some money so I stop loosing projects mid job.
Layer Design Guru Meditation Error: Core 0 panic'ed (InstrFetchProhibited). Exception was unhandled. Core 0 register dump: PC : 0x00000000 PS : 0x00060030 A0 : 0x800f33d2 A1 : 0x3ffe7560 A2 : 0x3ffcb628 A3 : 0x3ffcb4d5 A4 : 0x00000000 A5 : 0x00000000 A6 : 0x3ffcab80 A7 : 0x00000000 A8 : 0x800f3270 A9 : 0x3ffe7540 A10 : 0x3ffee2a4 A11 : 0x3ffee2a4 A12 : 0x3ffbc170 A13 : 0x00000000 A14 : 0x00000000 A15 : 0x00060023 SAR : 0x0000001a EXCCAUSE: 0x00000014 EXCVADDR: 0x00000000 LBEG : 0x4008b6c4 LEND : 0x4008b6da LCOUNT : 0xffffffff Backtrace: 0xfffffffd:0x3ffe7560 0x400f33cf:0x3ffe7590 0x400f0707:0x3ffe75b0 ELF file SHA256: c0232e7857604a85 Rebooting... ets Jul 29 2019 12:21:46 rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:1 load:0x3fff0030,len:1184 load:0x40078000,len:13220 ho 0 tail 12 room 4 load:0x40080400,len:3028 entry 0x400805e4 [MSG:ERR: Skipping configuration file due to panic] Grbl 3.7 [FluidNC v3.7.1 (wifi) '$' for help]
As of now, I tried current version (3.7.1) on tree different controllers (Root Controller ISO Rev 3, and two xPro v5) and all of then restarts after a while when using WebUI. Mitch advise me to try to new "index.html.gz" with report interval set to 0 and it works. Except the issues in my previous post. Maybe it will work for you as well.
Try this prerelease https://github.com/bdring/FluidNC/releases/tag/v3.7.2-pre2
Try this prerelease https://github.com/bdring/FluidNC/releases/tag/v3.7.2-pre2
Should I run erase.bat, then install-fs.bat and finally install-wifi.bat?
or just install-wifi.bat?
Running a 2hr job right now, I'll give it a test and report back.
Additionally should I replace what comes in this release with the .html.gz mentioned previously?
https://github.com/MitchBradley/ESP3D-WEBUI/blob/revamp/index.html.gz
@anon65453
Please start your own problem issue, so we can see your setup
Here is my opened issue, still unsure if its related. I originally opened my case for bad Wifi performance but crashing is also a constant issue. The bad Wifi performance case also had a slight improvement changing the polling/reporting interval as seen in many other threads.
Sometimes with so many wheels spinning its hard to understand which one needs the grease. I'm here for the ride, oh and I can change a tire too!
https://github.com/bdring/FluidNC/issues/930
The best approach would be to use install-wifi, then upload the latest WebUI from https://github.com/bdring/FluidNC/blob/Devt/FluidNC/data/index.html.gz . Use the download button at the right to get that file onto your host computer, then upload it to FluidNC either with FluidTerm Ctrl-U or with the WebUI upload feature.
The best approach would be to use install-wifi, then upload the latest WebUI from https://github.com/bdring/FluidNC/blob/Devt/FluidNC/data/index.html.gz . Use the download button at the right to get that file onto your host computer, then upload it to FluidNC either with FluidTerm Ctrl-U or with the WebUI upload feature.
I was able to run the firmware upgrade but when updating the webpage I got the following error.
Was able to use fluidterm to upload the latest .html.gz. I'll run another job right now, if you don't hear anything back then assume all is well.
How long should I wait for a non-event before assuming all is well?
How long should I wait for a non-event before assuming all is well?
I honestly hate complaining, just trying to have a stable laser, I hate failure rate due to things outside of my control. I will say tho, FluidNC is lightyears ahead of what the manufacturer is doing! The mfg firmware wouldn't run without communication issues.
Just made 6 coasters at 24,000mm/m with QR codes and logos around 35min, not bad! Let's see if it can finish my job of 200 of these bad boys!!! A high failure rate due to things outside does not make me rich nor happy. But for now I'm back to happy land.
How long should I wait for a non-event before assuming all is well?
Mitch,
Just wanted to report back and say all is well with my setup, over 5 days and 20 hours of testing/production. The webpage is stable and does not cause the machine to crash during jobs. I really appreciate all that you have put into this! Cheers from Mexico!!
Great, I am glad that your production is running now.
@TITAN3737 please try this prerelease to see if it helps with your problem https://github.com/bdring/FluidNC/releases/tag/v3.7.2-pre3 . The status of this ticket is unclear since someone else jumped in and added their own problem.
Sorry for the delay. I tried all the "pre" version (v3.7.2-pre1 through pre3), v3.7.3 (deleted) and now even v3.7.4. All of them have some major or minor glitches, but when I was writing list of all of them, it became very confusing on what is the cause. So, I will focus only on the latest v3.7.4. In the latest build I was not able to home my machine, I have found that something has changed and now I need to change the config file for some gpio pins to "gpio.##:low" otherwise homing goes crazy. Specifically, one of Y1 and Z limit switch. Which is odd because the X and Y2 is ok without any changes. I guess that will cause many issues for many people. So after flashing and first opening the WebUI, first thing that is odd, is that the Reports is not set at all. To clarify, not to "None", but neither of the buttons is selected, so there is really no reporting. That leads to many different issues, so I manually each time I open The WebUI set it to "Auto: 50ms", which kinda works. When running the test job file, I still get some hiccups in motion and occasional full 1-2 seconds stops, but the job is almost always completed. One thing that is still pretty dangerous is reloading the page. Doing that from phone (which does reload sometimes by itself), always causes 1-2 second pause. And when reloading from multiple devices (PC and phone) at once may result in crash (with or without memory low warning), usually with this line in fluidterm afterwards: "Guru Meditation Error: Core 0 panic'ed (InstrFetchProhibited). Exception was unhandled."
If there is any more testing I can do, I will do what I can. For now, I stick to v3.7.1 with autoreporting set to 0, that is stable and reliable for now.
@TITAN3737 When starting for the first time, you have to set the Status Refresh Time to at least 4 seconds in Preferences in the Top Right of theWebUI or set the AutoReport interval to 51 there.
Try going into the Preferences panel and set AutoReport Interval to 0 and Status Refresh Time to 4. Then hit Save at the bottom of the panel.
Your wifi network might be too unreliable for safe use of autoreporting.
There is nothing we can do to make webui reload work well in the middle of jobs. It puts so much load on the processor that it is almost guaranteed to cause motion problems. For milling jobs where there are long segments, it can sometimes work, but for laser jobs with a lot of short motions, the amount of work that the ESP32 has to do to serve the page interferes with the rapid-fire action of handling the short motions quickly.
Sorry for the delay, I was not able to resolve the issue with web UI. So I switched to using a PC for control and disabled WiFi completely. That solved the crashes for me.