Fix for invalid elf file when converting to bin
On windows 10 xltensa-lx106-elf-objdump sporadically returns no output and results into build errors. Adding logic to retry to read segment from elf.
Ref: #7253
Would it be possible to print the exception details so this weird behavior is understood ?
Would it be possible to print the exception details so this weird behavior is understood ?
I'm not sure that would be helpful. The exception now manually raised when the output is empty.
Python's subprocess.popen() will throw an OSError exception (which is not caught and will dump to console/IDE) if the app can't start for some reason. But, it's not in the case they're reporting because they're just showing the manually created "no start found" exception we make.
So we really don't have anything to go on here. I would maybe add a note to the top saying # This is a hack to fix intermittent Windows silent failures, see #7253 or something because it's the only spot in a whole bunch of calls we do this.
I'd also like to get a report or two that this fixes things from someone who has been having failures before merging.
My gut says there's something in these users' Windows installs that's blocking the app from running (malware scanner? antivirus? race condition w/OneDrive online? no idea) every now and then. I, too, have never had it fail in my (admittedly limited) Windows testing, nor has it crapped out ever in Windows CI.
I was indeed thinking of a race condition without further clue but
malware scanner? antivirus? race condition w/OneDrive online?
Considering that we are dealing with executable files, I think you put your finger on it @earlephilhower .
Would it be possible to print the exception details so this weird behavior is understood ? Python's
subprocess.popen()will throw anOSErrorexception
The process doesn't print any output to STDOUT, while debugging the issue I dumped each line inside the for-loop and noticed that when it fails there is no output. Perhaps the process writes on STDERR but returns with exit code 0 because there was no OSError exception on console. The process silently goes away sporadically without printing a single line. I will revert my local script change and put screenshot later when I will get chance. The error was pretty consistent to reproduce but it was failing randomly for one of the segment of elf.
My gut says there's something in these users' Windows installs that's blocking the app from running (malware scanner? antivirus? race condition w/OneDrive online? no idea)
I was running it on local isolated development environment which is excluded from OneDrive sync therefore I don't think its related to one drive/Antivirus/Malware.
Look at the attached log lines from build output I generated this log file locally after adding print in elf2bin.py. Notice how there were 2 attempts to load .text segment at line#6 and line#7.
with subprocess.Popen([path + '/xtensa-lx106-elf-objdump', '-h', '-j', segment, elf], stdout=subprocess.PIPE, universal_newlines=True ) as p:
lines = p.stdout.readlines()
print("Segment: {}. Attempt: {}/{} STDOUT Lines: {}".format(segment, attempts, maxAttempts, lines))
if (len(lines) > 0):
for line in lines:
The process silently goes away sporadically without printing a single line.
Have you tried wait() / communicate() after Popen() call, and then read the stdout? What is the stderr=subprocess.PIPE result though, is there anything there? (unlikely, but nonetheless... and I assume arduino ide would've printed it as we don't capture it?)
Have you tried wait() / communicate() after Popen() call, and then read the stdout?
The readlines() call should be blocking until the process exits, as I understand the Python libs. So, it's already effectively doing this.
Seeing if STDERR has anything would be useful, though, like you suggested!
The logs seem to show it failing after succeeding several times, so the race condition thing seems harder to believe now (unless AV/etc. is non-deterministic and only checks things after a few 10s of ms). And, still, no Windows CI failures (running on a Win2K21 Server VM I believe).
I cannot reproduce it anymore . I have modified the reattempt logic and added print statement to dump STDOUT and STDERR on console may be it will help in future.