Arduino icon indicating copy to clipboard operation
Arduino copied to clipboard

Fix for invalid elf file when converting to bin

Open anilsoni85 opened this issue 4 years ago • 8 comments

On windows 10 xltensa-lx106-elf-objdump sporadically returns no output and results into build errors. Adding logic to retry to read segment from elf.

Ref: #7253

anilsoni85 avatar Oct 31 '21 17:10 anilsoni85

Would it be possible to print the exception details so this weird behavior is understood ?

d-a-v avatar Oct 31 '21 22:10 d-a-v

Would it be possible to print the exception details so this weird behavior is understood ?

I'm not sure that would be helpful. The exception now manually raised when the output is empty.

Python's subprocess.popen() will throw an OSError exception (which is not caught and will dump to console/IDE) if the app can't start for some reason. But, it's not in the case they're reporting because they're just showing the manually created "no start found" exception we make.

So we really don't have anything to go on here. I would maybe add a note to the top saying # This is a hack to fix intermittent Windows silent failures, see #7253 or something because it's the only spot in a whole bunch of calls we do this.

I'd also like to get a report or two that this fixes things from someone who has been having failures before merging.

My gut says there's something in these users' Windows installs that's blocking the app from running (malware scanner? antivirus? race condition w/OneDrive online? no idea) every now and then. I, too, have never had it fail in my (admittedly limited) Windows testing, nor has it crapped out ever in Windows CI.

earlephilhower avatar Nov 01 '21 22:11 earlephilhower

I was indeed thinking of a race condition without further clue but

malware scanner? antivirus? race condition w/OneDrive online?

Considering that we are dealing with executable files, I think you put your finger on it @earlephilhower .

d-a-v avatar Nov 01 '21 23:11 d-a-v

Would it be possible to print the exception details so this weird behavior is understood ? Python's subprocess.popen() will throw an OSError exception

The process doesn't print any output to STDOUT, while debugging the issue I dumped each line inside the for-loop and noticed that when it fails there is no output. Perhaps the process writes on STDERR but returns with exit code 0 because there was no OSError exception on console. The process silently goes away sporadically without printing a single line. I will revert my local script change and put screenshot later when I will get chance. The error was pretty consistent to reproduce but it was failing randomly for one of the segment of elf.

My gut says there's something in these users' Windows installs that's blocking the app from running (malware scanner? antivirus? race condition w/OneDrive online? no idea)

I was running it on local isolated development environment which is excluded from OneDrive sync therefore I don't think its related to one drive/Antivirus/Malware.

anilsoni85 avatar Nov 01 '21 23:11 anilsoni85

Look at the attached log lines from build output I generated this log file locally after adding print in elf2bin.py. Notice how there were 2 attempts to load .text segment at line#6 and line#7.

        with subprocess.Popen([path + '/xtensa-lx106-elf-objdump', '-h', '-j', segment,  elf], stdout=subprocess.PIPE, universal_newlines=True ) as p:
            lines = p.stdout.readlines()
            print("Segment: {}. Attempt: {}/{} STDOUT Lines: {}".format(segment, attempts, maxAttempts, lines))
            if (len(lines) > 0): 
                for line in lines:

elf2bin.txt

anilsoni85 avatar Nov 02 '21 02:11 anilsoni85

The process silently goes away sporadically without printing a single line.

Have you tried wait() / communicate() after Popen() call, and then read the stdout? What is the stderr=subprocess.PIPE result though, is there anything there? (unlikely, but nonetheless... and I assume arduino ide would've printed it as we don't capture it?)

mcspr avatar Nov 03 '21 02:11 mcspr

Have you tried wait() / communicate() after Popen() call, and then read the stdout?

The readlines() call should be blocking until the process exits, as I understand the Python libs. So, it's already effectively doing this.

Seeing if STDERR has anything would be useful, though, like you suggested!

The logs seem to show it failing after succeeding several times, so the race condition thing seems harder to believe now (unless AV/etc. is non-deterministic and only checks things after a few 10s of ms). And, still, no Windows CI failures (running on a Win2K21 Server VM I believe).

earlephilhower avatar Nov 03 '21 15:11 earlephilhower

I cannot reproduce it anymore . I have modified the reattempt logic and added print statement to dump STDOUT and STDERR on console may be it will help in future.

anilsoni85 avatar Nov 09 '21 00:11 anilsoni85