QNICE-FPGA icon indicating copy to clipboard operation
QNICE-FPGA copied to clipboard

Nasty edge case with timer and memory load via UART

Open sy2002 opened this issue 3 years ago • 9 comments

This is a nasty edge case that is not "urgent" in the sense that it is not happening too often in real life: I can only reproduce it from time to time when I really stress out the system 😈

So I tagged it with V1.7. And I assigned it (for now) to all of us, because I will happily accept any debugging help :-) And because this is really an interessting case just from the perspective of looking at nasty edge cases. So here we go:

PROBLEM:

When uploading a large program via STDIN, i.e. by using any serial program via UART and using Monitor's "Memory/Load" and copy/paste of the .out file while test_programs/timer_test.asm is doing its 2-way multitasking magic (output text to STDOUT and let a ball bounce on VGA) THEN the loading process of the large program will be SOMETIMES corrupted in the sense that some (not all) words are corrupted and therefore the program will not run. For example QNICE (which is "green") will load and show some red colored garbage on the screen. But sometimes, everything works just fine.

REPRODUCTION:

  1. Put the newest /qbin folder on your SD card by pulling from the develop branch; insert the SD card into the hardware.
  2. Start the hardware with STDIN=UART and STDOUT=UART, but have your VGA monitor connected
  3. Connect via a terminal program
  4. Run the timer test on the hardware by entering F R qbin/timer_test.out
  5. Your terminal window will start to be filled with Timer 0 has issued interrupt request #0001 and so on and the ball will bounce on VGA. => SO FAR, SO GOOD!
  6. Enter M and L in the terminal window to bring the monitor into the load-mode
  7. Paste some large program (such as Q-TRIS) into your terminal window (for example by entering cd demos;../assembler/asm q-tris.asm while being in the Q-TRIS root folder, then you have it in the clipboard, then paste into the terminal)
  8. Press CTRL+E in the terminal after the paste/loading is done.
  9. Enter C R 8000 in the terminal
  10. Loading was corrupted. Q-TRIS will not start

You might need to repeat this process a few times, because it does not always happen.

What is also strange, that I could not reproduce it using qtransfer but only using a terminal program and doing copy/paste.

HYPOTHESES:

  • Might be related to issue #45 "Timer Interrupt Hardware: Check Edge Cases and Stability"
  • Might be related to issue #42 "Simulation: Timer Interrupt Hardware: Test Daisy Chaining Edge Case"
  • Might be as simple as an overflownig UART FIFO because our two ISR take sometimes to long to execute (simple test would be: make UART larger and look if it solves this one)
  • Might be that the UART's stopping of the CPU in the wait-for-data context has side effects
  • Might be a lot of other things, too

P.S. It might be helpful when debugging this, to enhance the monitor, so that it can compare a memory range, e.g. 0x8000 to 0x9193 with a file such as qbin/q-tris.out. When then might see patterns and or "how corrupt" everything is.

sy2002 avatar Aug 15 '20 15:08 sy2002

Just a guess: Could it be the UART FIFO that overflows ?

MJoergen avatar Aug 15 '20 16:08 MJoergen

@MJoergen Yes, might be. I wrote in my hypotheses section:

Might be as simple as an overflownig UART FIFO because our two ISR take sometimes to long to execute (simple test would be: make UART larger and look if it solves this one)

😃

sy2002 avatar Aug 15 '20 16:08 sy2002

A similar issue is seen on the latest develop branch when using the emulator.

Steps to reproduce:

  • Compile a large C program (e.g. make use of printf() with %ld formatting).
  • Start the emulator in VGA mode (i.e. ./qnice-vga ../monitor/monitor.out).
  • Load the compiled C program using the Monitors M-L command sequence following by pasting the .out file.
  • Start the program using the Monitors C-R 8000 command sequence.

The last step fails in different ways each time.

MJoergen avatar Aug 18 '20 20:08 MJoergen

@MJoergen Regarding your comment about the emulator: I guess this has nothing todo with the timer interrupt: Can you please make a new issue, for now I would say tag it with V1.6, provide the .out file that fails as a ZIP (you can drag'n'drop files into the GitHub comment boxes). Assign the issue to me and you, because Bernd has no knowledge about the VGA emulator. And: Could you please play around with const unsigned int uart_fifo_size = 2*32*1024; in emulator/uart.c line 32 ? I once wrote a comment there that reads:

#ifdef USE_VGA
# include <poll.h>
# include "fifo.h"
fifo_t*             uart_fifo; 
bool                uart_getchar_thread_running;  //flag to safely free the FIFO's memory
extern bool         gbl$cpu_running;              //the getchar thread stops when the CPU stops

/* Needs to be as large as the maximum amount of words that can be pasted while doing
   copy/paste in the M/L mode. Reason: The uart thread might pick up the data slower,
   than the operating systemm is pasting the data into the window. For being on the
   safe side, we chose double the size of the current size of 32k words */
const unsigned int  uart_fifo_size = 2*32*1024;
#endif

sy2002 avatar Aug 18 '20 21:08 sy2002

P.S. Here is a workaround for you, so that you can still load large programs in the emulator: After having started the VGA emulator with ./run-vga go to the terminal window and press CTRL+E. Now, when you see the Q> prompt, use the load command to load one or more .out files. Then enter run 0 to start at the monitor and then you can start with C R 0x8000.

sy2002 avatar Aug 18 '20 21:08 sy2002

Could you please play around with const unsigned int uart_fifo_size = 2*32*1024; in emulator/uart.c line 32 ?

Indeed, this works. I guess the calculation is wrong, since each word requires transferring approximately 16 characters: "0x8000 0x1234" including CR/LF. So the FIFO size needs to be multiplied by 16.

I tried that and it solves my problem!

Is this how we should fix it? I mean, is there any downside (I'm thinking about Emscripten / WebAssembly, which I have no knowledge of) to allocating a 1 MB buffer for the UART FIFO ?

MJoergen avatar Aug 19 '20 03:08 MJoergen

Hi Michael, thank you for your feedback. I fixed it and checked in the fix in develop: Please re-test on your side. On mine I tested using "test_programs/adventure.c", the .out file is 163K in size, so rather large: Worked. But my Mac is really fast, so it also worked in past, therefore we need your retest. Here is a screenshot to show that it works - but also to show you a small trick which I do not know if you already knew it:

grafik

Your point with Emscripten/WASM is very valid. Furtheron, we do not do copy/paste in WASM. We use FAT32 disk images and - yet to be developed - other means of transferring data from the PC to the emulator. Therefore, we must not "spam" WASM with a too big FIFO and therefore I decided to go the #ifdef route as you can see here:

#ifdef USE_VGA
# include <poll.h>
# include "fifo.h"
fifo_t*             uart_fifo; 
bool                uart_getchar_thread_running;  //flag to safely free the FIFO's memory
extern bool         gbl$cpu_running;              //the getchar thread stops when the CPU stops

    #ifndef __EMSCRIPTEN__
    /* In VGA but non WASM mode:

       Needs to be as large as the maximum amount of words that can be pasted while doing
       copy/paste in the M/L mode. Reason: The uart thread might pick up the data slower,
       than the operating systemm is pasting the data into the window. For being on the
       safe side, we chose double the size of the current size of 32k words and multiply this
       by 16 because of the way our .out file format is structured */
    const unsigned int  uart_fifo_size = 16*2*32*1024;

    #else
    /* In VGA WASM mode:

       Data transfer is not happening via copy/paste but via the file system (disk mount)
       and other - still to be developed - mechanisms. Therefore we by far do not need
       such a big FIFO at Emscripten. */
    const unsigned int  uart_fifo_size = 2*32*1024;
    #endif
#endif

sy2002 avatar Aug 19 '20 10:08 sy2002

The INCRB/DECRB of the timer ISRs might collide so this might be related to #135 and should be retested then

sy2002 avatar Sep 15 '20 22:09 sy2002

We should change the timer_test to adhere to the ISR best practice (MOVE 0xF000, SP and remove the INCRB/DECRB braket) and wait until issue #138 is done, because timer_test is modifying the registers of devices using the currently (as of the time of writing) still unsafe SYSCALL functions to print strings.

EDIT: SYSCALL(puts, 1) might be safe to use - to be checked - otherwise the timer_test ISR needs to save/restore the device's state to make it safe.

sy2002 avatar Sep 17 '20 13:09 sy2002