QNICE-FPGA
QNICE-FPGA copied to clipboard
Nasty edge case with timer and memory load via UART
This is a nasty edge case that is not "urgent" in the sense that it is not happening too often in real life: I can only reproduce it from time to time when I really stress out the system 😈
So I tagged it with V1.7. And I assigned it (for now) to all of us, because I will happily accept any debugging help :-) And because this is really an interessting case just from the perspective of looking at nasty edge cases. So here we go:
PROBLEM:
When uploading a large program via STDIN, i.e. by using any serial program via UART and using Monitor's "Memory/Load" and copy/paste of the .out
file while test_programs/timer_test.asm
is doing its 2-way multitasking magic (output text to STDOUT and let a ball bounce on VGA) THEN the loading process of the large program will be SOMETIMES corrupted in the sense that some (not all) words are corrupted and therefore the program will not run. For example QNICE (which is "green") will load and show some red colored garbage on the screen. But sometimes, everything works just fine.
REPRODUCTION:
- Put the newest
/qbin
folder on your SD card by pulling from thedevelop
branch; insert the SD card into the hardware. - Start the hardware with STDIN=UART and STDOUT=UART, but have your VGA monitor connected
- Connect via a terminal program
- Run the timer test on the hardware by entering
F
R
qbin/timer_test.out
- Your terminal window will start to be filled with
Timer 0 has issued interrupt request #0001
and so on and the ball will bounce on VGA. => SO FAR, SO GOOD! - Enter
M
andL
in the terminal window to bring the monitor into the load-mode - Paste some large program (such as Q-TRIS) into your terminal window (for example by entering
cd demos;../assembler/asm q-tris.asm
while being in the Q-TRIS root folder, then you have it in the clipboard, then paste into the terminal) - Press CTRL+E in the terminal after the paste/loading is done.
- Enter
C
R
8000
in the terminal - Loading was corrupted. Q-TRIS will not start
You might need to repeat this process a few times, because it does not always happen.
What is also strange, that I could not reproduce it using qtransfer
but only using a terminal program and doing copy/paste.
HYPOTHESES:
- Might be related to issue #45 "Timer Interrupt Hardware: Check Edge Cases and Stability"
- Might be related to issue #42 "Simulation: Timer Interrupt Hardware: Test Daisy Chaining Edge Case"
- Might be as simple as an overflownig UART FIFO because our two ISR take sometimes to long to execute (simple test would be: make UART larger and look if it solves this one)
- Might be that the UART's stopping of the CPU in the wait-for-data context has side effects
- Might be a lot of other things, too
P.S. It might be helpful when debugging this, to enhance the monitor, so that it can compare a memory range, e.g. 0x8000 to 0x9193 with a file such as qbin/q-tris.out. When then might see patterns and or "how corrupt" everything is.
Just a guess: Could it be the UART FIFO that overflows ?
@MJoergen Yes, might be. I wrote in my hypotheses section:
Might be as simple as an overflownig UART FIFO because our two ISR take sometimes to long to execute (simple test would be: make UART larger and look if it solves this one)
😃
A similar issue is seen on the latest develop
branch when using the emulator.
Steps to reproduce:
- Compile a large C program (e.g. make use of
printf()
with%ld
formatting). - Start the emulator in VGA mode (i.e.
./qnice-vga ../monitor/monitor.out
). - Load the compiled C program using the Monitors
M-L
command sequence following by pasting the.out
file. - Start the program using the Monitors
C-R 8000
command sequence.
The last step fails in different ways each time.
@MJoergen Regarding your comment about the emulator: I guess this has nothing todo with the timer interrupt: Can you please make a new issue, for now I would say tag it with V1.6, provide the .out
file that fails as a ZIP (you can drag'n'drop files into the GitHub comment boxes). Assign the issue to me and you, because Bernd has no knowledge about the VGA emulator. And: Could you please play around with
const unsigned int uart_fifo_size = 2*32*1024;
in emulator/uart.c
line 32 ? I once wrote a comment there that reads:
#ifdef USE_VGA
# include <poll.h>
# include "fifo.h"
fifo_t* uart_fifo;
bool uart_getchar_thread_running; //flag to safely free the FIFO's memory
extern bool gbl$cpu_running; //the getchar thread stops when the CPU stops
/* Needs to be as large as the maximum amount of words that can be pasted while doing
copy/paste in the M/L mode. Reason: The uart thread might pick up the data slower,
than the operating systemm is pasting the data into the window. For being on the
safe side, we chose double the size of the current size of 32k words */
const unsigned int uart_fifo_size = 2*32*1024;
#endif
P.S. Here is a workaround for you, so that you can still load large programs in the emulator: After having started the VGA emulator with ./run-vga
go to the terminal window and press CTRL+E
. Now, when you see the Q>
prompt, use the load
command to load one or more .out
files. Then enter run 0
to start at the monitor and then you can start with C
R
0x8000
.
Could you please play around with
const unsigned int uart_fifo_size = 2*32*1024;
inemulator/uart.c
line 32 ?
Indeed, this works. I guess the calculation is wrong, since each word requires transferring approximately 16 characters: "0x8000 0x1234
" including CR/LF. So the FIFO size needs to be multiplied by 16.
I tried that and it solves my problem!
Is this how we should fix it? I mean, is there any downside (I'm thinking about Emscripten / WebAssembly, which I have no knowledge of) to allocating a 1 MB buffer for the UART FIFO ?
Hi Michael, thank you for your feedback. I fixed it and checked in the fix in develop
: Please re-test on your side. On mine I tested using "test_programs/adventure.c", the .out
file is 163K in size, so rather large: Worked. But my Mac is really fast, so it also worked in past, therefore we need your retest. Here is a screenshot to show that it works - but also to show you a small trick which I do not know if you already knew it:
![grafik](https://user-images.githubusercontent.com/6825267/90626388-30dcea00-e21b-11ea-9e26-ba53a49daf8f.png)
Your point with Emscripten/WASM is very valid. Furtheron, we do not do copy/paste in WASM. We use FAT32 disk images and - yet to be developed - other means of transferring data from the PC to the emulator. Therefore, we must not "spam" WASM with a too big FIFO and therefore I decided to go the #ifdef route as you can see here:
#ifdef USE_VGA
# include <poll.h>
# include "fifo.h"
fifo_t* uart_fifo;
bool uart_getchar_thread_running; //flag to safely free the FIFO's memory
extern bool gbl$cpu_running; //the getchar thread stops when the CPU stops
#ifndef __EMSCRIPTEN__
/* In VGA but non WASM mode:
Needs to be as large as the maximum amount of words that can be pasted while doing
copy/paste in the M/L mode. Reason: The uart thread might pick up the data slower,
than the operating systemm is pasting the data into the window. For being on the
safe side, we chose double the size of the current size of 32k words and multiply this
by 16 because of the way our .out file format is structured */
const unsigned int uart_fifo_size = 16*2*32*1024;
#else
/* In VGA WASM mode:
Data transfer is not happening via copy/paste but via the file system (disk mount)
and other - still to be developed - mechanisms. Therefore we by far do not need
such a big FIFO at Emscripten. */
const unsigned int uart_fifo_size = 2*32*1024;
#endif
#endif
The INCRB/DECRB of the timer ISRs might collide so this might be related to #135 and should be retested then
We should change the timer_test to adhere to the ISR best practice (MOVE 0xF000, SP
and remove the INCRB/DECRB
braket) and wait until issue #138 is done, because timer_test is modifying the registers of devices using the currently (as of the time of writing) still unsafe SYSCALL
functions to print strings.
EDIT: SYSCALL(puts, 1) might be safe to use - to be checked - otherwise the timer_test ISR needs to save/restore the device's state to make it safe.