Status of FEATURE_FASTCONTEXTS in cappuccino
The cappuccino CPU has a parameter FEATURE_FASTCONTEXTS which is said to "Enable fast context switching of register sets". Setting this to ENABLED makes SR[CE] writable, however SR[CE] does not seem to actually do anything. SR[CID] is not writable, and the top bits of wb_rfd_adr_expand, rfa_rdad, and rfb_rdad are all hard-coded to 0.
Is FEATURE_FASTCONTEXT expected to be useful for anything now? Is there work planned in this area? Would you accept a PR which made SR[CID] writable, SR[CE] increment it on exception, and used its bits to select the current GPR set?
Hi @zeldin sorry I missed your message. I really struggle to get notifications to me from gihub. I will try to figure this out.
We don't use fast context switches in any OS's right now, and traditionally context is saved to the stack to allow for a more general context switching mechanism. But I do think fast context switching would be an interesting experiment and implementation would be appreciated. I have not even looked into it as per what I stated before. The issue with fast context switching is we will always have an upper bound of how many concurrent contexts we can support. This may be good for a small RTOS but might be hard to implement with a general OS like Linux.
Would you be interested in writing some test software too to help show the benefits vs storing context to the stack? What are you thinking as a use case?
My use case is not an OS, not even an RTOS, but bare metal embedded. I don't want multiple processes or anything, just efficient (both in space, as I want to be able to run from BRAM only, and in execution time) processing of interrupts and exceptions. Currently I use crt0.o from newlib as vector entrypoint, but it would be nice be able to switch to something leaner.
Having less dead space between the entrypoints would also be nice, but alas that is defined by the architecture. :disappointed:
What environmental requirements would you have on such test software? Is something which runs on a Nexys A7 board and outputs something on the debug UART ok? Or runnable in some Icarus setup?
If the change is in the mor1kx core any platform you use would be fine as long as the software can be built in newlib. I use either Litex or mor1kx-generic to run software either via simulations or FPGA.
Well, it would have to be built with -nostartfiles since replacing newlib's crt0.o with something more efficient is key here. :smile:
I'll see if I can whip something up, but probably not this week.
That should be fine, the one thing that is needed from newlib are the board libraries to handle reading/writing the UART.
< shorne@antec ~/work/openrisc/embench-tester > ls ~/local/or1k-elf/or1k-elf/lib/ -l | grep or1k
-rw-r--r--. 1 shorne shorne 1258 Mar 21 2022 libboard-or1ksim.a
-rw-r--r--. 1 shorne shorne 1264 Mar 21 2022 libboard-or1ksim-uart.a
-rw-r--r--. 1 shorne shorne 84070 Mar 21 2022 libor1k.a
Though, this does remind me that we have some re-entrant code somewhere in newlib that uses shadow registers to temporarily store context. I could not find it on first glance. This might creep up later.
Good luck.
@stffrdhrn So, I wanted to try using your mor1kx-generic, but it failed already on the first instruction of your test asm program, without me making any local changes at all:
vvp -n -M. -l icarus.log -melf_loader_vpi -mjtag_vpi mor1kx-generic_1.1 -fst +elf_load=/tmp/openrisc/src/openrisc-asm +trace_enable=1 +trace_to_screen=1 +vcd=1
FST info: dumpfile testlog.vcd opened for output.
Program header 0: addr 0x00000000, size 0x000001A0
elf-loader: /tmp/openrisc/src/openrisc-asm was loaded
Loading 104 words
0 : Illegal Wishbone B3 cycle type (xxx)
S 00000100: 18800000 l.movhi r4,0x0000 r4 = 00000000 flag: 0
S 00000104: a8840110 l.ori r4,r4,0x0110 r4 = 00000110 flag: 0
S 00000108: 44002000 l.jr r4 flag: 0
S 0000010c: 15000000 l.nop 0x0000 flag: 0
S 00000110: 18000000 l.movhi r0,0x0000 r0 = 00000000 flag: 0
S 00000114: 9c200001 l.addi r1,r0,0x0001 r1 = 00000001 flag: 0
S 00000118: 9c410002 l.addi r2,r1,0x0002 r2 = 00000003 flag: 0
S 0000011c: 9c620004 l.addi r3,r2,0x0004 r3 = 00000007 flag: 0
S 00000120: 9c830008 l.addi r4,r3,0x0008 r4 = 0000000f flag: 0
S 00000124: 9ca40010 l.addi r5,r4,0x0010 r5 = 0000001f flag: 0
S 00000128: 9cc50020 l.addi r6,r5,0x0020 r6 = 0000003f flag: 0
S 0000012c: 9ce60040 l.addi r7,r6,0x0040 r7 = 0000007f flag: 0
S 00000130: 9d070080 l.addi r8,r7,0x0080 r8 = 000000ff flag: 0
S 00000134: 9d280100 l.addi r9,r8,0x0100 r9 = 000001ff flag: 0
S 00000138: 9d490200 l.addi r10,r9,0x0200 r10 = 000003ff flag: 0
S 0000013c: 9d6a0400 l.addi r11,r10,0x0400 r11 = 000007ff flag: 0
S 00000140: 9d8b0800 l.addi r12,r11,0x0800 r12 = 00000fff flag: 0
S 00000144: 9dac1000 l.addi r13,r12,0x1000 r13 = 00001fff flag: 0
S 00000148: 9dcd2000 l.addi r14,r13,0x2000 r14 = 00003fff flag: 0
S 0000014c: 9dee4000 l.addi r15,r14,0x4000 r15 = 00007fff flag: 0
S 00000150: 9e0f8000 l.addi r16,r15,0x8000 r16 = ffffffff flag: 0
S 00000154: e3e00802 l.sub r31,r0,r1 r31 = ffffffff flag: 0
S 00000158: e3df1002 l.sub r30,r31,r2 r30 = fffffffc flag: 0
S 0000015c: e3be1802 l.sub r29,r30,r3 r29 = fffffff5 flag: 0
S 00000160: e39d2002 l.sub r28,r29,r4 r28 = ffffffe6 flag: 0
S 00000164: e37c2802 l.sub r27,r28,r5 r27 = ffffffc7 flag: 0
S 00000168: e35b3002 l.sub r26,r27,r6 r26 = ffffff88 flag: 0
S 0000016c: e33a3802 l.sub r25,r26,r7 r25 = ffffff09 flag: 0
S 00000170: e3194002 l.sub r24,r25,r8 r24 = fffffe0a flag: 0
S 00000174: e2f84802 l.sub r23,r24,r9 r23 = fffffc0b flag: 0
S 00000178: e2d75002 l.sub r22,r23,r10 r22 = fffff80c flag: 0
S 0000017c: e2b65802 l.sub r21,r22,r11 r21 = fffff00d flag: 0
S 00000180: e2956002 l.sub r20,r21,r12 r20 = ffffe00e flag: 0
S 00000184: e2746802 l.sub r19,r20,r13 r19 = ffffc00f flag: 0
S 00000188: e2537002 l.sub r18,r19,r14 r18 = ffff8010 flag: 0
S 0000018c: e2327802 l.sub r17,r18,r15 r17 = ffff0011 flag: 0
S 00000190: e2118002 l.sub r16,r17,r16 r16 = ffff0012 flag: 0
S 00000194: 18600000 l.movhi r3,0x0000 r3 = 00000000 flag: 0
S 00000198: 15000001 l.nop 0x0001 flag: 0
exit(0x00000000);
Is this also a work in progress? (BTW, I opened an issue on that repo with some typos I found in the readme. I'm mentioning it here since you said you were having issues with GH notifications.)
@stffrdhrn Hm, actually, never mind, it looks like I misinterpreted the "Illegal Wishbone B3 cycle type" as a fatal error. It does run correctly. Do consider fixing the typos in your README when you have the time though. :smile:
Thanks, yes that illegal wishbone cycle is something that is a bit misleading. Everything does look to work right as per the log.
I'll look at the readme typos
@stffrdhrn Working on the FASTCONTEXTS implementation, I found an inconsistency in the OpenRISC 1000 Architecture Manual:
Section 6.3 says that CID is incremented on exception. But section 6.4.2 says that exceptions switch to the main context (CID=0).
Either way works for me, but do you happen to know which way it's supposed to be?
This is interesting, I looked into it there are two different things here. The SR[CID] 4-bits whick refer to the current context. Then there is the CXR 32-bit register which also had details of the context.
The thing is the CXR register doesn't seem to be an spr and I don't know if our assembler even supports this name. But also this CXR is supposed to be used for manual context switching. It's not used when automatic fast switching is enabled.
But also this CXR is supposed to be used for manual context switching. It's not used when automatic fast switching is enabled.
Yes, however automatic fast switching is supposed to write the previous value of CXR[CCID] (which, AFAICT, is identical to SR[CID]) into CXR[CCRS], so that you can find the previous context in case you need to look at its registers (in the case of a syscall, for example). This is clearly only needed if CID is set to 0 instread of incrementing, otherwise you can just decrement it to find the old value. But even then, can't you just use ESR[CID] to find out? It seems to me that CXR is completely redundant, which might be why it seems nobody has remembered to assign it an spr number...
Just wanted to add that I did make a test software. All of it, including main and exception stack, fits in 2 Kbyte of BRAM (the vector table at address 0x0000-0x1fff is pure combinatorial and not backed by BRAM or registers). But now I need to actually implement FCS for it to work properly (if you run it now there is some register contention between the tick timer handler and the main loop, giving incorrect output) and I can't do that without some clarity on which way CID is supposed to move on an exception when SR[CE] is set...