Prusa-Firmware-Buddy [BUG] Prusa Connect / Link Gcode Corruption on OTA gcode upload

Please, before you create a new bug report, please make sure you searched in open and closed issues and couldn't find anything that matches.

Printer type - [MK4]

Printer firmware version - [5.0 Alpha 4]

Original or Custom firmware - [Original]

USB drive or USB/Octoprint USB Drive formatted as FAT32

Describe the bug When uploading gcode via prusa link and prusa connect over the air, gcode corrupts causing toolhead to move outside of path, sometimes it continues causing failed print, or the printer gets stuck and is unable to move.

For example the print head will move to the right or north of the object and pause/get stuck, print is unable to pause or stop.

How to reproduce With alpha 4, upload a gcode file to the printer and check the file that was uploaded to the printer USB in gcode viewer and compare it to an export direct from prusa slicer.

Note if you download the file directly from connect cloud it is correct, the corruption is occurring when uploading the file to the USB over wifi.

Expected behavior Gcode uploaded via wifi should be identical to a direct upload to USB via usb port

G-code Working and Problomatic Gcode.zip

Note the files are the same gcode, 1 is direct from prusa slicer to usb, the other is via prusa link. Please compare the tool path in gcode viewer to see this issue.

Crash dump file Please attach the crash dump file. This will make it easier for us to investigate the bug.

Video Please attach a video. It usually helps to solve the problem.

Jul 22 '23 19:07 jrgiacone

The differences in your files Print Directly from Prusa slicer.gcode and Print Uploaded via prusa link.gcode appear to be in 64-byte chunks. It's as if the transfer corrupted the file by mixing in chunks of some other, unrelated file.

This looks like a pretty nasty bug as mixing in random gcodes may harm your printer.

I noticed that the changes occur in 64-byte chunks by analyzing a few of the difference blocks found by running:

diff -u Print\ Directly\ from\ Prusa\ slicer.gcode Print\ Uploaded\ via\ prusa\ link.gcode

The blocks I looked at all had sizes that were multiples of 64-bytes.

Jul 25 '23 15:07 jvasileff

I'd like some confirmation/clarification. By „Upload by wifi“ you mean the PrusaLink interface on the IP address of the printer / through IP address from slicer?

Does it happen every time, or just sometimes? How often?

Can you try with a different USB stick?

Jul 25 '23 17:07 vorner

It has occurred via upload through IP from prusa slicer, direct gcode upload from prusa link, and file upload from prusa connect to the printer.

Last night did another one via prusa connect and it was identical so I don't think its on every upload. Have tried with 2 different usb sticks, the stock and a new one

Jul 25 '23 17:07 jrgiacone

@jvasileff we are trying to reproduce this problem and have a question to improve the reproduction process. Do you use MAC or Windows?

Jul 27 '23 13:07 JohnnyDeer

@JohnnyDeer I was only analyzing the files posted by jrgiacone. I have not reproduced this myself. This issue caught my eye because I have seen Connect become out of sync with the files on the printer's USB drive, and Connect seems to trust filenames too much when a new version is uploaded without changing the filename, but so far that seems unrelated.

Jul 27 '23 14:07 jvasileff

I have reproduced it on Windows and Linux, primarily using Linux. I don't have access to a mac. It does not appear to happen on every occurrence, I have found a power reset to sometimes fix it?

Jul 27 '23 21:07 jrgiacone

gcode 2.zip happened again, labeled the one from prusa slicer OTA upload

Jul 28 '23 01:07 jrgiacone

Thank you for bringing this to our attention.

We acknowledge the presence of the bug and our developers are attentively investigating this issue. Unfortunately, the bug is hardly reproducible though.

We would appreciate gathering further details for the reproducibility of the problem and possibly feedback from more users.

Here are examples of possibly useful pieces of information that we may want to collect.

Exact steps and patterns followed by the users before the occurrence of the issue.
Computer hardware and operating system information.
Network and connectivity information like router model, distance of the printer from the router, internet speed tests made with other devices in the printer's position...
Programs and operations running on the computer that may take up substantial computer and internet resources.
Devices connected to the computer or to the same network that may take up substantial computer and internet resources.
Number of devices connected to the same network.
Status of the printer.
Last operations performed on the printer.
Third-party devices connected to the printer.
Reproducibility of the problem via Ethernet or via a separate Wi-Fi network (e.g. mobile hotspot).
Examples of corrupted files (before and after the data transfer).

Michele Moramarco Prusa Research

Jul 28 '23 12:07 Prusa-Support

Hi Michele,

Exact steps and patterns followed by the users before the occurrence of the issue.

This has occurred both after a print completion has been sitting idle for a while, and even after a restart. Unsure if board temperature has anything to do with it, typically buddy board is 55-60C when printing.

Computer hardware and operating system information.

Primary Computer is running Arch Linux kernel: 6.4.7 Intel i7 6700HQ Prusa slicer: https://archlinux.org/packages/extra/x86_64/prusa-slicer/

Windows computer is running Windows 10 with AMD 5800x and Nvidia 3080. Running prusa slicer release 6.0

Network and connectivity information like router model, distance of the printer from the router, internet speed tests made with other devices in the printer's position...

Modem: Spectrum Model EN2251 Router: Spectrum Model SAX1V1R Distance: About 15 Feet away Speed From Router is 450 Mbps Download and 20 upload from same position.

Programs and operations running on the computer that may take up substantial computer and internet resources.

The only programs running would be Firefox and prusa slicer as the system was tested at pretty much idle.

Devices connected to the computer or to the same network that may take up substantial computer and internet resources.

No Devices are connected to the laptop. The desktop has 2 monitors, a mouse, and a keyboard.

Number of devices connected to the same network.

Typically 6-10 Devices are connected.

Status of the printer.

Unsure on this one, I think it typically Idle or ready to print?

Last operations performed on the printer.

This has occurred both when transferring after a completed print, and after a restart.

Third-party devices connected to the printer.

Nothing is connected to the printer.

Reproducibility of the problem via Ethernet or via a separate Wi-Fi network (e.g. mobile hotspot).

Have not tried via ethernet, this has only been tested via wifi

Examples of corrupted files (before and after the data transfer). gcode.2.zip Working.and.Problomatic.Gcode.zip Note files in the folder are labeled which was transfered over prusa slicer OTA (over the air) to printer.

The Gcode 2 folder was when printing this stl: https://www.printables.com/model/506041-prusa-mk4-filament-guide-for-ptfe-fitting

When comparing the gcode 2 file, it appears to have randomly added in a filament swap that just moves the print head away and the back without any user interaction (not this was not intentional and was not present on the gcode uploaded directly to USB from the usb port on the computer).

I am happy to help test in any other was or provide any other information needed. @Prusa-Support

Jul 28 '23 15:07 jrgiacone

Gcode Example 3.zip

It seems to have done it again uploading from prusa slicer OTA, note when I reuploded it there was no issues with the second one. Attached are screenshots of the differences in gcode Diff 1 diff 2 diff 3

Jul 29 '23 00:07 jrgiacone

If OP hadn't said s/he tried two USB sticks, this has all the hallmarks of a corrupted USB stick?!

Jul 29 '23 00:07 mix579

I took a closer look at gcode 2.zip and Gcode Example 3.zip, and what I found about the corruption is pretty interesting. Below, I'll refer to the "good" gcode file as v1, and the corrupted file as v2.

The corruption occurs in exactly 32-byte chunks. To analyze the corruption, we can first split each file into a bunch of 32-byte files with a command like split -b 32 -a 4 v1.gcode. Fingerprints for each chunk can then be created with find v1 -type f -exec md5 {} + > v1-md5s. We are then left with two files - v1-md5s and v2-md5s, which fully describe the original and corrupted gcode files in 32 byte chunks.

Analysis can be done with commands like diff -u v1-md5s v2-md5s, yielding:

--- v1-md5s     2023-07-29 12:09:55.952211454 -0400
+++ v2-md5s     2023-07-29 12:10:55.186357521 -0400
@@ -469,7 +469,7 @@
 MD5 (./xckhs) = 84e3ffd2863ba304f27d146a2dc59996
 MD5 (./xavya) = f3f08a555cef524001fab082ee6a6a50
 MD5 (./xakwl) = 285d1a7aa761de0db83219e19e28e388
-MD5 (./xakqu) = bed64c8ba35a13cbdbffcde3c9b39e27
+MD5 (./xakqu) = 862a0484fb481016590124d798f229f3
 MD5 (./xbwme) = 6bea35919a15b764ff5ed25351f4e07d
 MD5 (./xcand) = 75d0ae1e30c6b006b7ed840374c12b97
 MD5 (./xbhfs) = 48460aac9f132df8f4565c027a2cfad2
@@ -4472,7 +4472,7 @@
 MD5 (./xcikp) = 93bb8e33b593b70d49ab8b9c53c265c2
 MD5 (./xatzb) = 79eeb96b402d344ac274854716a61614
 MD5 (./xakwm) = dfd8a12e34b0080d2993659e5bf3e560
-MD5 (./xakqt) = 18c413f2d7af5294875aa3807c5e9cb7
+MD5 (./xakqt) = 3ef43351da57b392f9359baf6cc88d0e
 MD5 (./xcane) = 401827f26696585ff8b4e7199af159dd
 MD5 (./xbwmd) = 612723674e9ef948cb19986835f97226
 MD5 (./xbzqk) = 6804cafcfeab5fed4b980235d729a610
@@ -4887,7 +4887,7 @@
 MD5 (./xatze) = 715cfef9350c0c72eac8ffeb9e893f74
 MD5 (./xcikw) = 251d94b211b9fc8348990da3825b19f6
 MD5 (./xcimn) = 44c8ec41cdcf40294eda5bd97df537b7
-MD5 (./xakqs) = 081676dadbd1c9f40529aad093c99ae4
+MD5 (./xakqs) = d9bb7e5b7dc91f8b5c21112c0d128a41
 MD5 (./xakwj) = 5e9d00603a07adeb49d337beea986c44
 MD5 (./xcvfx) = ce05871851ab48e69b54fa18aaa6bdbb
 MD5 (./xbwkz) = 902a400864f977a95ee63a49475ba269
...

In red, we can see the hash that we should have for each chunk, and in green we see the hash that we actually have in the corrupted file. By looking at this diff, the following observations can be made:

All corruption is indeed in 32-byte chunks
The vast majority of "bad" chunks in the corrupted files exactly match content in the original file, it's just that the 32-bytes are repeated from some other location in the source file.
The relatively few "bad" chunks in the corrupted file that cannot be found in the original file look like garbage or possibly FAT32 filesystem info
Interestingly, "bad" chunks in the corrupted file often contain data "from the future" - that is, data that only occurs later in the original file. For example, in the first diff above, 862a0484fb481016590124d798f229f3 occurs in the corrupted file as the 472nd chunk. It also occurs in both the original file and corrupted files as the 20553th chunk. This indicates that the corruption occurs sometime after the bytes have been sent over the wire, or that new bytes are overwriting previously written bytes.

For Gcode Example 3.zip, there are 78 corrupted chunks. For all but 12 of those chunks, the data also occurs in the original file. The content of the other 12 chunks is:

% for i in $(cat chunks-not-in-v1); do echo "\n$i" ; cat $i ; done

v2/xbuzu
@!A!B!C!D!E!F!G!
v2/xbvac
?!?!?!?!?!?!?!?!
v2/xbvab
x!y!z!{!|!}!~!!
v2/xbvaa
p!q!r!s!t!u!v!w!
v2/xbuzw
P!Q!R!S!T!U!V!W!
v2/xbuzx
X!Y!Z![!\!]!^!_!
v2/xbuzy
`!a!b!c!d!e!f!g!
v2/xbuzt
8!9!:!;!<!=!>!?!
v2/xbuzz
h!i!j!k!l!m!n!o!
v2/xbuzs
0!1!2!3!4!5!6!7!
v2/xbvad
?!?!?!?!?!?!?!?!
v2/xbuzv
H!I!J!K!L!M!N!O!%

I have no idea what's going on here. But perhaps this will help someone with more knowledge of the PrusaLink firmware code or the FAT32 filesystem code. Regarding @mix579's comment, it does seem like a problem with the filesystem, but it's weird that the corruption is only happening when the files are being written by the MK4. Maybe using the USB stick on the Windows or Linux machine created some metadata that the MK4 can't handle?

Jul 29 '23 17:07 jvasileff

I am experiencing the same issue copying over wired LAN. But I found that, with the example gcode attached (made from the example STL), I can trigger the bug by changing the filament profile to 3D fuel Pro PLA.

example_gcode.zip Original_breaks_over_LAN.gcode is the file written locally to disk that will break when exported over LAN. Broken_breaks_over_LAN.gcode is the file that was exported over LAN and that is different from the original that was exported to disk. Original_does_NOT_break_over_LAN.gcode is the file is identical when written to disk or exported over LAN, i.e., exporting it over LAN does not result in a changed file. example_STL.zip

I tested it back-and-forth twice each with the same positive/null result each time. I do not know if it is triggered by switching filaments, generally, and I have had other prints with the 3D Fuel Pro PLA filament setting transfer correctly as well as prusament prints that broke. I only know that the steps outlined below were reproducible at least twice.

OS/Software: Ubuntu 23.04 6.2.0-26-generic PrusaSlicer-2.6.0+snap4

Printer: MK4 kit with 5.0.0-alpha4 firmware No third-party anything plugged in.

Steps to reproduce:

Open PrusaSlicer with default settings and load STL file. Change filament profile to 3D Fuel Pro PLA. Export gcode to disk from PrusaSlicer. Export gcode to MK4 over LAN from PrusaSlicer. Unplug USB stuck from MK4 and plug into computer. Compare md5 sums of the two files and they will be different.

Steps that do NOT trigger bug:

With PrusaSlicer still open to the same file, change filament profile to Prusamment PLA. Export gcode to disk from PrusaSlicer. Export gcode to MK4 over LAN from PrusaSlicer. Unplug USB stuck from MK4 and plug into computer. Compare md5 sums of the two files and they will be identical.

Aug 12 '23 18:08 rchiechi

Hi Michele, as programmer, even if I did not have time to look around the source code, the issue seems related to a "rolling buffer", especially the garbage from the future Look for this occurrence:

a pointer to a memory buffer variable (or to an array)
an index pointer based on the offset of the buffer
when bytes are received, they are added by the index pointer on the offset of the buffer.
unfortunately (check why) when the last chunk of bytes are received, they are written to the top of the buffer this can be due to: memory segment overlap, or incrementing and resetting the index pointer BEFORE writing the bytes in the buffer. This explains why the bytes from the future that are the final part of the chunks, and they are written on the top of the buffer overwriting the correct one, and then written to the file.

Aug 13 '23 17:08 antimix

I've also encountered the same bug on my own MK4 running (non-alpha) firmware 4.7.1.

I have noticed (and others have commented) how slow gcode transfers via PrusaLink seem to be, so there's definitely a bottleneck somewhere. If a circular buffer is in use, a bug like this would be possible in two places: when the buffer fills, or when the buffer empties. In other words, writing too much data or reading too far ahead could both be a problem. And since we have seen cases where the out-of-place block appears to come from the future, I would suspect that the USB write is the bottleneck and data from the network overruns the current index.

I'd be on the lookout for:

An off-by-1 indexing error which allows the start/end of the buffer to overlap (and if this happens multiple times in a row, you'd see the "multiples of 32" issue)
A race condition between the network writing to the buffer and the USB code reading from the buffer (in this case, multiples could be caused by timing)
Possibly both?

I also think there needs to be some validation of the file after writing. A checksum would be the most logical approach. That clearly doesn't solve the problem, but it would make sure it was caught, and it would provide absolute certainty that a file was transferred correctly no matter what bugs may or may not exist.

Aug 15 '23 00:08 jstm88

In the meantime, I have just started to use my new MK4 with 4.7.2 and in order to check if this issue appears, I am using a FlashAir SD instead of the USB dongle (that is a USB Card reader with a FlashAir SD inserted ;) ) so that, since it is mapped as drive K: to my PC, I can immediate check with a compare app if the transferred file on K: is the same on my C: drive. Till now on the 4.7.2 it never failed.

Aug 15 '23 09:08 antimix

Folks, apologies, there is an error with my prior analysis. I did not sort the output when generating md5s. To keep the md5s for the chunks in file order, the command:

find v1 -type f -exec md5 {} + > v1-md5s

should be

find v1 -type f -exec md5 {} + | sort > v1-md5s

For ease of analysis, split can also be adjusted to use numerical filenames:

split -d -a 10 -b 32 ../v1.gcode

The good news, is that with this, the corruption is much more straight forward. For the errant chunks that also occur in the source file, the errant file's chunk occurs in the source file exactly 512 or 384 chunks earlier for the bad chunks I analyzed. So there is no "time travel". Instead, stale data is being written. It's as if there is a race condition or invalid state on a 16KB (512 * 32 byte) buffer, where the file writer re-reads slightly older data.

There are also a limited number of invalid chunks that don't occur in the source document that may or may not indicate a second bug.

Here's a portion of a diff for this new, corrected analysis for the gcode 2.zip files:

--- v1-md5s     2023-08-15 11:54:53.998600412 -0400
+++ v2-md5s     2023-08-15 11:55:06.809257874 -0400
@@ -1166,22 +1166,22 @@
 MD5 (./x0000001165) = 7dcca393b45f7a4e507f11908c3ef554
 MD5 (./x0000001166) = 507bc37bbdd7edd77e9e7ad18a6ea1e6
 MD5 (./x0000001167) = 147d0690887a8876fc4d3fa95ea95eb1
-MD5 (./x0000001168) = be44c6fd054a4cc001faf5b6e8119131
-MD5 (./x0000001169) = df10e4c0f8b6e5bdf82229de819d110a
-MD5 (./x0000001170) = 5962dc85f9690e8d7ff0643c39fc02a7
-MD5 (./x0000001171) = 692d3593cc7a80020d3d6819f64778e2
-MD5 (./x0000001172) = 6697ef83d06af695819c2560e82f3a15
-MD5 (./x0000001173) = b056fb99bb4c9f8124d4931c1d26e1e4
-MD5 (./x0000001174) = 18d57ee990d53df959908d49f3255e16
-MD5 (./x0000001175) = 62e96e338b7b97f731556d1893ef98c0
-MD5 (./x0000001176) = b513792a428a2d5cfa4d718072bc5e71
-MD5 (./x0000001177) = 5fd9758e54f57a84b0a77b5bb4b1b025
-MD5 (./x0000001178) = 7544e188e2bc29c448ee3baa7d1f6ef7
-MD5 (./x0000001179) = 7f8245d257a21709e2b910fa924aa714
-MD5 (./x0000001180) = 9c5188cbd1be570412971a915af272a9
-MD5 (./x0000001181) = b0b3fbd6127fe50302266d68c29ae55f
-MD5 (./x0000001182) = 8b153042a67d9089acc13457761fd9b7
-MD5 (./x0000001183) = 1138824b3d3805fcace55be1de124f5e
+MD5 (./x0000001168) = 0b95690866a25a8827cab31847983b42     Same as chunk 656
+MD5 (./x0000001169) = 5f54e0382b210e480506a2c9880cc5f9     Same as chunk 657
+MD5 (./x0000001170) = 4dea27d6cfcabcf243430d6587ea7c8c     ...
+MD5 (./x0000001171) = 6b9a7da3102d09754ec743863ce60892     ...
+MD5 (./x0000001172) = 9876231de592c95acd0e4fc5dbc57224     ...
+MD5 (./x0000001173) = 2130c1f4077e1c93aa359220549ca678     ...
+MD5 (./x0000001174) = 2e1a377cd6eff7b007bbb86c8e1771a0     ...
+MD5 (./x0000001175) = 91c2480d8cc85e67d7ad55d653180253     ...
+MD5 (./x0000001176) = d303c53950cec87da6e41556d4c131b6     ...
+MD5 (./x0000001177) = 4a52e560ad6ae6d23c85de9f38102e25     ...
+MD5 (./x0000001178) = dd2efab2edb51b85c32122e6060f9901     ...
+MD5 (./x0000001179) = b95b7a9ca5055ed1d47ea6b5735efedf     ...
+MD5 (./x0000001180) = 2a25b4d3362a7ff1a95b77d60e7a76a0     ...
+MD5 (./x0000001181) = e1c5eb01c7ebec400c15b1703a38fc3c     ...
+MD5 (./x0000001182) = 89ddfbbf72afa0df1332b50aa3d7345c     ...
+MD5 (./x0000001183) = 709cfbbaa189e4f8af3665ff016d8145     Same as chunk 671
 MD5 (./x0000001184) = 695578208be168de26935e05618c2d4d
 MD5 (./x0000001185) = d749863b6081e46fa43ede15f4ba9a2b
 MD5 (./x0000001186) = 9f9ed60468961dba9b55c6c55b0c6257

Aug 15 '23 16:08 jvasileff

Just as a suggestion to others, here's what I'm going to do while this is being worked out:

Export the gcode to a local file
Upload via the PrusaLink web interface
Download a copy from the web interface
Hash both files and confirm a match
Print

Of course, the download code may also have a similar flaw, so it will be interesting to see if this gives any false positives. But with this method, we can at least guarantee that there's no risk of damaging the printer, and we can still have the convenience of not having to shuttle USB drives back and forth (and put stress on the very tight USB port the MK4 has).

I would also like to see PrusaSlicer itself perform this check (i.e. download the file and compare hashes before starting the print). It could even be applied to OctoPrint connections as an extra safety check. Not that I've ever seen corruption with OctoPrint, but it never hurts to have extra verification...

Aug 15 '23 20:08 jstm88

I haven't done the work to assess if I have experienced this issue myself, but from reading about it, @Prusa-Support if you're unsure where to start looking for problems, I'm gonna go ahead and guess this has to do with the frankly terrible network to USB file transfer code. Get a networking engineer to rewrite that for you and I bet the issue will disappear in the process.

Aug 16 '23 08:08 abjugard

Reading through the comments in here again, I'll re-check one thing.

@jrgiacone Are you 100% you've managed to get a corrupt transfer from Connect too?

My first thoughts at the time were pointing to certain newly introduced code (which'll admin I'm not really proud of, it was written a bit in desperation). The further comments here would actually strengthen the feeling. Except the code is not being used at all in the Connect transfer, only in Link so I've ignored that feeling.

Aug 16 '23 14:08 vorner

@vorner I believe I have, however, I've only tried connect a few times so I could have miss remembered, the majority of the issues have come from using prusa slicer to upload direct to prusa link through a physical printer.

With regards to connect I could be remembering wrong, I stopped using it because it would take way too long to upload any files. So I would say I am not 100%

Aug 16 '23 15:08 jrgiacone

I wrote a quick (i.e. "bare minimum") Python script which uploads the file, then downloads, hashes the download, and compares it to the original. The download and compare happens in memory so there's no need to clutter up your downloads directory with copies. If you wish you can automate it to run the script whenever a file is dropped in a specific directory.

I wanted to make it delete the uploaded file if it's found to be corrupt, but even when it's not set to print on upload, the printer loads the latest file uploaded and that makes it give an error when attempting to delete it since it considers the file to be in use. So no matter what you need to walk over to the printer and cancel it. The script also lacks some proper robust error handling but it's "good enough". Feel free to adapt to your needs.

#!/usr/bin/env python3

import os
import sys
import json
import time
import hashlib
import requests
import urllib.parse

PRINTER_URL="http://..."
API_KEY="..."


src_path = os.path.expanduser(sys.argv[1])
src_filename = os.path.basename(src_path)

with open(src_path, 'rb') as f:
    src_rawdata = f.read()

src_hash = hashlib.sha256()
src_hash.update(src_rawdata)

try:
    put_response = requests.put(
        url="{}/api/v1/files/usb/{}".format(PRINTER_URL, urllib.parse.quote(src_filename)),
        headers={"X-Api-Key": API_KEY, "Content-Type": "text/x.gcode", "Overwrite": "0"},
        data=src_rawdata)
except requests.exceptions.RequestException:
    print("HTTP Request failed")
    sys.exit(1)
if (put_response.status_code != 201):
    print("Upload error")
    sys.exit(1)

print("Successful Upload!")

put_rdict = json.loads(put_response.text)
dl_path = put_rdict.get('refs').get('download')
try:
    dl_response = requests.get(
        url="{}{}".format(PRINTER_URL, dl_path),
        headers={"X-Api-Key": API_KEY})
except requests.exceptions.RequestException:
    print("HTTP Request failed")
    print("File not validated")
    sys.exit(1)

if (dl_response.status_code != 200):
    print("Could not validate file")
    sys.exit(1)

dl_hash = hashlib.sha256()
dl_hash.update(dl_response.content)

print("SRC: {}".format(src_hash.hexdigest()))
print("DST: {}".format(dl_hash.hexdigest()))
print("")

if (src_hash.digest() == dl_hash.digest()):
    print("Files validated!")
else:
    print("UPLOADED FILE CORRUPTED")

Aug 17 '23 15:08 jstm88

@jstm88 Have you had any failures? It seems that for those affected, the failures happen fairly often, but most people have no failures at all. It makes me wonder if there are different buddy board versions that run different code paths.

If corruption is reproduced with your script, it should be useful for troubleshooting as it could be tried in different environments.

Aug 17 '23 15:08 jvasileff

@jvasileff I haven't noticed any more failures.

I actually ran several dozen prints on the MK4 after I built it without seeing any obvious problems. Those gcode files were all uploaded directly from PrusaSlicer, so the file on the printer would have been the only copy with nothing to validate against. It's possible there were corruptions but they weren't significant enough to show up in the prints.

I first noticed the problem on a relatively large one-off print and I saw a few of the external zits just near the top. I don't have the file any more but inspecting the print it definitely looks like they all happened towards the end of the file. The very next print I uploaded after that one had corruption throughout the entire file.

For what it's worth, I re-uploaded that specific file several times using my script and there was no corruption.

This actually has me wondering if the likelihood of corruption increases as the storage fills up. I'm using a 64GB Samsung drive so it would take a long time to fill it up, but I wonder if there might be some problem with higher block offsets, or potentially issues that happen as the allocation table grows. We'd need a lot more data to determine if that has anything to do with it. In other words, has this ever happened on an empty drive, and does it become more likely if there is more data on the drive? I can't say at this point. If we can reproduce it on an empty drive we could disprove the theory, at least.

Aug 17 '23 16:08 jstm88

FWIW all of the corruptions that I encountered have been from a 256 GB drive that is nowhere near full. I also had fewer problems with smaller (in file size / print time) prints, where the corruptions is only noticeable as some small flaws in a couple of layers.

Aug 17 '23 16:08 rchiechi

This actually has me wondering if the likelihood of corruption increases as the storage fills up. I'm using a 64GB Samsung drive so it would take a long time to fill it up, but I wonder if there might be some problem with higher block offsets, or potentially issues that happen as the allocation table grows.

Yeah, or if write sizes vary with fragmentation.

Aug 17 '23 16:08 jvasileff

There seems to be a lot of speculation in here… so, let me share something from what we have.

There are not different code paths depending on board version regarding the networking code.

We have a working theory (very much unconfirmed yet). If it is correct, the bug is very timing dependent. This involves both the speed of USB (which does change with fragmentation, yes) and the „effective“ wifi speed. This seems to be the reason why some people can reproduce it often while others never see a corrupt file (we still haven't been able to reproduce despite bombarding printers with hundreds of transfers).

An interesting experiment could be to move the printer further away from the router, or put some interference between them, simply to see if the change of timing leads to the bug disappearing.

Aug 17 '23 18:08 vorner

An interesting experiment could be to move the printer further away from the router, or put some interference between them, simply to see if the change of timing leads to the bug disappearing.

I got many corrupted files transferring over a 1Gb wired Ethernet switch. It is a 24 port Ubiquiti switch (not a cheap no-name) and the printer and computer are each connected by less than two meters of cat-6 cable. No WiFi involved.

Are you saying that if the network connection is to fast it might trigger the bug? If so, I can give it a go on the WiFi and connect the printer to an AP on the other side of the house.

Aug 17 '23 19:08 rchiechi

I can confirm my corrupted files also occurred with the printer connected via a 1Gb wired connection.

The upload speeds are extremely slow. The printer is the bottleneck, although it's unclear if it's the networking code, USB code, or the USB hardware itself. Local network transfers with other file servers are as fast as expected (near gigabit) and this also applied to OctoPrint when I was using it.

Aug 17 '23 19:08 jstm88

The upload speeds are extremely slow. The printer is the bottleneck, although it's unclear if it's the networking code, USB code, or the USB hardware itself. Local network transfers with other file servers are as fast as expected (near gigabit) and this also applied to OctoPrint when I was using it.

Same for me: transfer speeds to the printer are far slower than they should be even for the 100Mb Ethernet interface on the printer. But I haven't tried OctoPrint, just PrusaSlicer and via Python requests.

Aug 17 '23 20:08 rchiechi