openFPGALoader icon indicating copy to clipboard operation
openFPGALoader copied to clipboard

openFPGALoader sometimes crashes at the end of programming a ULX3S 85F

Open sjolsen opened this issue 5 years ago • 8 comments

I'm trying to program the example design for the ULX3S (ECP5) from the nmigen-boards repository and openFPGALoader is crashing inconsistently. Sometimes it crashes; sometimes it doesn't. Sometimes the crashes are correlated with the design not running; sometimes the design runs just fine. Here's an example of what I see:

(.env) [vorash-debian:~/repro] python -m nmigen_boards.ulx3s 85F
ret 0
Open file /tmp/nmigen_e4eqy0e0_top.bit DONE
Parse file DONE
Enable configuration: DONE
SRAM erase: DONE
Loading: [===================                               ] 37.994545%problem -1 written
Loading: [=========================                         ] 48.850128%problem -1 written
Loading: [==================================================] 100.000000%
Done
Disable configuration: DONE
double free or corruption (out)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/sjo/repro/.env/lib/python3.8/site-packages/nmigen_boards/ulx3s.py", line 197, in <module>
    platform().build(Blinky(), do_program=True)
  File "/home/sjo/repro/.env/lib/python3.8/site-packages/nmigen/build/plat.py", line 103, in build
    self.toolchain_program(products, name, **(program_opts or {}))
  File "/home/sjo/repro/.env/lib/python3.8/site-packages/nmigen_boards/ulx3s.py", line 161, in toolchain_program
    subprocess.check_call([tool, "-b", "ulx3s", '-m', bitstream_filename])
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['openFPGALoader', '-b', 'ulx3s', '-m', '/tmp/nmigen_e4eqy0e0_top.bit']' died with <Signals.SIGABRT: 6>.

If I recompile openFPGALoader with -fsanitize-address, I consistently get output like the following:

(.env) [vorash-debian:~/repro] ~/fpga/tools/openFPGALoader-build/openFPGALoader -b ulx3s -m build/top.bit
ret 0
Open file build/top.bit DONE
Parse file DONE
Enable configuration: DONE
SRAM erase: DONE
Loading: [                                                  ] 0.000000%=================================================================
==25838==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100003d100 at pc 0x55910e641e01 bp 0x7fff3068a330 sp 0x7fff3068a328
WRITE of size 1 at 0x62100003d100 thread T0
    #0 0x55910e641e00 in FtdiJtagBitBang::writeTDI(unsigned char*, unsigned char*, unsigned int, bool) /home/sjo/fpga/tools/openFPGALoader/src/ftdiJtagBitbang.cpp:182
    #1 0x55910e637378 in Jtag::read_write(unsigned char*, unsigned char*, int, char) /home/sjo/fpga/tools/openFPGALoader/src/jtag.cpp:186
    #2 0x55910e6375c5 in Jtag::shiftDR(unsigned char*, unsigned char*, int, int) /home/sjo/fpga/tools/openFPGALoader/src/jtag.cpp:208
    #3 0x55910e747da1 in Lattice::program_mem() /home/sjo/fpga/tools/openFPGALoader/src/lattice.cpp:238
    #4 0x55910e74d30f in Lattice::program(unsigned int) /home/sjo/fpga/tools/openFPGALoader/src/lattice.cpp:526
    #5 0x55910e66b910 in main /home/sjo/fpga/tools/openFPGALoader/src/main.cpp:285
    #6 0x7f97697dacc9 in __libc_start_main ../csu/libc-start.c:308
    #7 0x55910e5d5029 in _start (/home/sjo/fpga/tools/openFPGALoader-build/openFPGALoader+0x113029)

0x62100003d100 is located 0 bytes to the right of 4096-byte region [0x62100003c100,0x62100003d100)
allocated by thread T0 here:
    #0 0x7f9769c601f8 in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.6+0xaa1f8)
    #1 0x55910e641034 in FtdiJtagBitBang::FtdiJtagBitBang(FTDIpp_MPSSE::mpsse_bit_config const&, jtag_pins_conf_t const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, bool) /home/sjo/fpga/tools/openFPGALoader/src/ftdiJtagBitbang.cpp:73
    #2 0x55910e6363e6 in Jtag::init_internal(cable_t&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, jtag_pins_conf_t const*, unsigned int) /home/sjo/fpga/tools/openFPGALoader/src/jtag.cpp:95
    #3 0x55910e635ffb in Jtag::Jtag(cable_t&, jtag_pins_conf_t const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, bool) /home/sjo/fpga/tools/openFPGALoader/src/jtag.cpp:76
    #4 0x55910e66a8b3 in main /home/sjo/fpga/tools/openFPGALoader/src/main.cpp:219
    #5 0x7f97697dacc9 in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/sjo/fpga/tools/openFPGALoader/src/ftdiJtagBitbang.cpp:182 in FtdiJtagBitBang::writeTDI(unsigned char*, unsigned char*, unsigned int, bool)
Shadow bytes around the buggy address:
  0x0c427ffff9d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c427ffff9e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c427ffff9f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c427ffffa00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c427ffffa10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c427ffffa20:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427ffffa30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427ffffa40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427ffffa50: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427ffffa60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c427ffffa70: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==25838==ABORTING

For reference, here are the software versions I used to produce the above outputs:

  • nmigen-boards: 0.1.dev168+gb90a89d
  • nmigen: 0.3.dev228+g59ef6e6
  • yosys: 0.9+3696 (git sha1 45725d3b, clang 9.0.1-15+b1 -fPIC -Os)
  • ecppack (prjtrellis): 1.0-476-ga441cd9
  • nextpnr-ecp5: 89552307
  • openFPGALoader: 5b19775

And here's what I used to build openFPGALoader itself (all from Debian):

  • gcc: Debian 10.2.0-16
  • libusb-1.0: 2:1.0.23-2
  • libftdi1-2: 1.5-5
  • libudev1: 246.6-4

I also tried backing up to openFPGALoader v0.1, which fails in a similar way but with a different error message:

(.env) [vorash-debian:~/repro] ~/fpga/tools/openFPGALoader-build/openFPGALoader -b ulx3s -m build/top.bit
ret 0
Open file build/top.bit DONE
Parse file DONE
Enable configuration: DONE
SRAM erase: DONE
Loading: [=======================                           ] 44.146042%problem -1 written
Loading: [======================================            ] 75.989090%problem -1 written
malloc(): invalid size (unsorted)
zsh: abort      ~/fpga/tools/openFPGALoader-build/openFPGALoader -b ulx3s -m build/top.bit

sjolsen avatar Nov 26 '20 05:11 sjolsen

Thanks for this issue. I'm not able to reproduce your issue but thanks to logs the cause is easy to find. I'll fix this ASAP.

trabucayre avatar Nov 26 '20 08:11 trabucayre

I've just pushed a quick fix to avoid buffer overflow. A better way must be find to avoid increasing USB transfer but in a first time this seems to work. Could you confirm ? Thanks

trabucayre avatar Nov 26 '20 08:11 trabucayre

Thanks! I'm afraid I still get the error:

[vorash-debian:~/repro] ~/fpga/tools/openFPGALoader-build/openFPGALoader -b ulx3s -m build/top.bit
ret 0
Open file build/top.bit DONE
Parse file DONE
Enable configuration: DONE
SRAM erase: DONE
Loading: [                                                  ] 0.000000%problem -1 written
Loading: [===================                               ] 36.185280%problem -1 written
Loading: [=============================                     ] 57.534595%problem -1 written
Loading: [===============================================   ] 93.358025%problem -1 written
Loading: [==================================================] 100.000000%
Done
Disable configuration: DONE
double free or corruption (out)
zsh: abort      ~/fpga/tools/openFPGALoader-build/openFPGALoader -b ulx3s -m build/top.bit

I found that flushing the buffer inside the for loop immediately before the buffer access avoided the error (flashed 20 times with no crashes), but I don't know if that is a correct change. I've attached it as a patch.

flush-patch.txt

sjolsen avatar Nov 26 '20 09:11 sjolsen

Weird! Your solution has a drawback when you do a read with a full buffer size, but the idea is here. Could you share your bitstream? the problem -1 written mean an issue with something in libusb and/or kernel, but it's difficult to said the reason

trabucayre avatar Nov 26 '20 10:11 trabucayre

Here are the build artifacts, including the bitstream: build.tar.gz

I should note a couple more things: now that I'm able to run the loader more reliably, it sometimes fails with this error:

(.env) [vorash-debian:~/repro] python -m nmigen_boards.ulx3s 85F
ret 0
Open file /tmp/nmigen_t84z9m7e_top.bit DONE
Parse file DONE
Enable configuration: DONE
SRAM erase: DONE
Loading: [==========                                        ] 19.901903%problem -1 written
Loading: [=================                                 ] 32.566753%problem -1 written
Loading: [==================================================] 100.000000%
Fail
displayReadReg
	Config Target Selection : 0
	JTAG Active
	ISC Enable
	Write Enable
	Read Enable
	Std PreAmble
	Abort ERR
	EXEC Error

I don't know if this is relevant.

I also found that after suspending my desktop overnight, the transfer became much slower and never failed or printed problem -1 written. This might have something to do with the fact that I'm passing the USB device from Windows into a VirtualBox machine running Debian -- who knows.

sjolsen avatar Nov 26 '20 23:11 sjolsen

I've tried your .bit -> work perfectly. I've just pushed a small fix to avoid valgrind from complaining about array not initialized (this array is not really used so it's not a real problem). Could you retry after a git pull (and without your fix) Yes, failure you mention is related to problem -1 written : libusb or low level is unable to send the packet so bitstream in RAM is corrupted and, consequently, ECP5 is not happy. This week-end I will tried with a computer running the same debian, but I haven't windows to test with virtualbox... But yes your issue may be related to this abstraction layer.

trabucayre avatar Nov 27 '20 07:11 trabucayre

I've tried with a computer running debian successfully. So I think a problem with virtualbox or a conflict between VM and windows.

@umarcor has added msys2 support, could you try this way?

Thank

trabucayre avatar Dec 09 '20 07:12 trabucayre

openFPGALoader was upstreamed but it's not in the repos yet: https://packages.msys2.org/new. In a few days some maintainer will move it and then it will be available through pacman -Ss openFPGALoader. Meanwhile, you can try the following solutions:

  • Download the PKGBUILD recipe that @trabucayre referenced and run MINGW_INSTALLS=mingw64 makepkg-mingw. Then, install it with pacman -U *.zst.
  • Checkout #65 and execute MINGW_INSTALLS=mingw64 makepkg-mingw in subdir msys2. Then, install it with pacman -U *.zst.
  • Download the artifact from https://github.com/msys2/MINGW-packages/actions/runs/408971765, unzip it and install it with pacman -U *.zst.

umarcor avatar Dec 09 '20 07:12 umarcor