Forth850
Forth850 copied to clipboard
A fast Forth Standard system written in Z80 assembly for SHARP PC-G850 pocket computers
A modern Forth 2012 standard compliant system for the vintage SHARP PC-G850(V)(S) pocket computer or any Z80 system (with a few tweaks to port).
Forth850 is under 8K and has 295 words.
A more complete 11K version forth850-full is also included with:
-
58 additional words
-
single precision floating point math words implemented with a new and efficient Z80 IEEE 754 floating point math library I wrote for Forth850. More floating point math functions that use this library are defined in Forth in examples/MATH.FTH
-
a more capable Forth line editor with replay back feature (cursor keys)
Forth850 includes stack under/overflow checks, dictionary overflow checks and can be interrupted by pressing BREAK.
You can write Forth source code in the PC-G850(V)(S) built-in TEXT editor and compile it into Forth850 with the TEXT word included in the full version.
You can extend Forth850 as you wish, including assembly code written on the
machine itself in the PC-G850(V)(S) TEXT editor and assembled with its Z80
Assembler. See ASMDEMO1.FTH for an example with an
explanation. You can also use the Monitor to set breakpoints and run Forth850
from the Monitor with G100
to trigger them.
If you want to rebuild Forth850 from source code, you will need to install the asz80 assembler part of the ASxxxx Cross Assemblers.
If you plan to use parts of Forth850 and/or the optimized Z80 code in a project that you plan to share or redistribute, then please give me credit for my work as per BCD-3 license.
Performance
I've implemented Forth850 as efficiently as possible in direct threaded code (DTC) with new Z80 code written from scratch, including faster Z80 integer and float math routines compared to other Z80 Forth implementations. See the technical implementation sections why Forth850 is fast for a DTC implementation. The Forth850 source code is included and extensively documented.
The n-queens benchmark is solved in 0.865 seconds, the fastest Forth implementation of the benchmarks. Forth850 n-queens runs 5 times faster than the C n-queens benchmark on the Sharp PC-G850VS.
Installation
In RUN MODE enter MON
to enter the Monitor, then enter USER3FFF
to reserve
16K memory space:
*USER3FFF
FREE:0100-3FFF
Loading via SIO (serial) requires a serial adapter. See my post on the
HP Forum
how to construct one as a DIY project. After reserving memory in the Monitor
as described above, use the R
command to read the forth850.ihx file or the
forth850-full.ihx full version sent from your PC to your PC-G850(V)(S):
*R100
The R
command is used to transmit/receive data in Intel hex format over SIO.
This command is for receiving machine code from a personal computer or other
device. See the Sharp PC-G850(V)(S) manual.
To load via the cassette interface, press BASIC
to return to RUN MODE. Load
forth850.wav using a cassette interface CE-126P or a CE-124:
BLOADM
Load the forth850-full.wav "full version" to include many additional words and floating point words. The full version will continue to evolve with new features.
How to switch between Forth and BASIC
To return to Forth, enter CALL256
in RUN MODE.
To return to BASIC from Forth, press the BASIC key. The TEXT key takes you to the TEXT editor.
To turn the machine off, press the OFF key. The machine will also power off automatically after about 10 minutes waiting for user input at the prompt.
How to increase or decrease memory allocation for Forth850
Memory allocation can be adjusted without affecting the Forth dictionary.
In RUN MODE enter MON
to enter the Monitor, then enter USERaddr
with an
upper address addr
larger than 23ff
(9K bytes.) If words are added to
Forth850, you must make sure that addr
is large enough, i.e. equal or larger
than the hex value displayed with:
HERE #708 + HEX . DECIMAL
23FF
In the Monitor specify USERaddr
with the address displayed. This leaves
about 200 bytes free dictionary space plus 40 bytes for the "hold area" to run
Forth850. The largest size is USER75FF
which gives about 21K free dictionary
space (but there won't be space left on the machine for files, BASIC or TEXT.)
Forth850 manual
Forth850 is 2012 standard compliant. For help, see the manual included with Forth for the Sharp PC-E500(S) and Forth 2012 Standard.
Forth850 implements a subset of the standard Forth words. A list of Forth850 words with an explanation is given below.
Forth850 words
List of Forth850 built-in words. Reference implementations in Forth are included when applicable, although many words are implemented in Z80 code for speed rather than in Forth.
(:)
-- ; R: -- ip call colon definition; runtime of the : compile-only word
(;)
-- ; R: ip -- return to caller from colon definition; runtime of the ; compile-only word
(EXIT)
-- ; R: ip -- return to caller from colon definition; runtime of the EXIT compile-only word
(;CODE)
-- ; R: ip -- set LASTXT cfa to ip and return from colon definition; a runtime word compiled by the DOES> compile-only word
(DOES)
addr -- addr ; R: -- ip calls the DOES> definition with pfa addr; a runtime word compiled by the DOES> compile-only word coded as call dodoes
(VAR)
-- addr leave parameter field address (pfa) of variable; runtime word of a VARIABLE coded as call dovar
(VAL)
-- x fetch value; runtime word of a VALUE coded as call doval
(2VAL)
-- dx fetch double value; runtime word of a 2VALUE coded as call dotwoval
(CON)
-- x fetch constant; runtime word of a CONSTANT coded as call docon
(2CON)
-- x fetch double constant; runtime word of a 2CONSTANT coded as call dotwocon
(DEF)
-- execute deferred word; runtime word of a DEFER coded as call dodef
(LIT)
-- x fetch literal; runtime word compiled by EVALUATE, INTERPRET and NUMBER
(2LIT)
-- x1 x2 fetch double literal; runtime word compiled by EVALUATE, INTERPRET and NUMBER
(SLIT)
-- c-addr u fetch literal string; runtime word compiled by S" and ."
0
-- 0 leave constant 0
0 CONSTANT 0
1
-- 1 leave constant 1
1 CONSTANT 1
-1
-- -1 leave constant -1
-1 CONSTANT -1
BL
-- 32 leave constant 32 (space)
#32 CONSTANT BL
PAD
-- c-addr leave address of the PAD; the PAD is a free buffer space of 256 bytes not used by Forth850
TIB
-- c-addr leave address of TIB; the terminal input buffer used by Forth850
TMP
-- c-addr leave address of the next temp string buffer; switches between two string buffers of 256 free bytes each; used by S" to store a string when interpreting
DROP
x -- drop TOS
DUP
x -- x x duplicate TOS
?DUP
x -- x x or 0 -- 0 duplicate TOS if nonzero
SWAP
x1 x2 -- x2 x1 swap TOS with 2OS
OVER
x1 x2 -- x1 x2 x1 copy 2OS over TOS
ROT
x1 x2 x3 -- x2 x3 x1 rotate cells
: ROT >R SWAP R> SWAP ;
-ROT
x1 x2 x3 -- x3 x1 x2 undo (or back, or left) rotate cells
: -ROT ROT ROT ;
NIP
x1 x2 -- x2 nip 2OS
: NIP SWAP DROP ;
TUCK
x1 x2 -- x2 x1 x2 tuck TOS under 2OS
: TUCK SWAP OVER ;
2DROP
xd1 xd2 -- xd1 drop double TOS
: 2DROP DROP DROP ;
2DUP
xd -- xd xd duplicate double TOS
: 2DUP OVER OVER ;
2SWAP
xd1 xd2 -- xd2 xd1 swap double TOS with double 2OS
: 2SWAP ROT >R ROT R> ;
: 2SWAP 3 ROLL 3 ROLL ;
2OVER
xd1 xd2 -- xd1 xd2 xd1 copy double 2OS over double TOS
: 2OVER >R >R 2DUP R> R> 2SWAP ;
: 2OVER 3 PICK 3 PICK ;
DEPTH
-- u parameter stack depth
: DEPTH sp0 @ SP@ - 2- 2/ ;
CLEAR
... -- purge parameter stack
: CLEAR sp0 @ SP! ;
.S
-- display parameter stack
: .S DEPTH 0 ?DO sp0 @ I 2+ CELLS - ? LOOP ;
SP@
-- addr fetch stack pointer
SP!
addr -- store stack pointer
>R
x -- ; R: -- x move TOS to the return stack
DUP>R
x -- x ; R: -- x duplicate TOS to the return stack, a single word for DUP >R
R>
R: x -- ; -- x move cell from the return stack
RDROP
R: x -- ; -- drop cell from the return stack, a single word for R> DROP
R@
R: x -- x ; -- x fetch cell from the return stack
2>R
x1 x2 -- ; R: -- x1 x2 move double TOS to the return stack, a single word for SWAP >R >R
2R>
R: x1 x2 -- ; -- x1 x2 move double cell from the return stack, a single word for R> R> SWAP
2R@
R: x1 x2 -- x1 x2 ; -- x1 x2 fetch double cell from the return stack
RP@
-- addr fetch return stack pointer
RP!
addr -- store return stack pointer
PICK
xu ... x0 u -- xu ... x0 xu pick u'th cell from the parameter stack; 0 PICK is the same as DUP; 1 PICK is the same as OVER
: PICK 1+ CELLS SP@ + @ ;
@
addr -- x fetch from cell
C@
c-addr -- char fetch char
2@
addr -- x1 x2 fetch from double cell
: 2@ DUP CELL+ @ SWAP @ ;
!
x addr -- store in cell
(TO)
x -- store in value; runtime of the TO compile-only word
C!
char c-addr -- store char
2!
x1 x2 addr -- store in double cell
: 2! TUCK ! CELL+ ! ;
(2TO)
dx -- store in double value; runtime of the TO compile-only word
+!
n addr -- increment cell
(+TO)
n -- increment value; runtime of the +TO compile-only word
ON
addr -- store TRUE (-1) in cell
: ON -1 SWAP ! ;
OFF
addr -- store FALSE (0) in cell
: OFF 0 SWAP ! ;
+
n1 n2 -- n3 sum n1+n2
M+
d1 n -- d2 double sum d1+n
D+
d1 d2 -- d3 double sum d1+d2
: D+ >R M+ R> + ;
-
n1 n2 -- n3 difference n1-n2
D-
d1 d2 -- d3 double difference d1-d2
: D- DNEGATE D+ ;
UM*
u1 u2 -- ud unsigned double product u1*u2
M*
n1 n2 -- d signed double product n1*n2
: M*
2DUP XOR >R
ABS SWAP ABS UM*
R> 0< IF DNEGATE THEN ;
*
n1|u1 n2|u2 -- n3|u3 signed and unsigned product n1*n2
: * UM* DROP ;
UMD*
ud1 u -- ud2 unsigned double product ud1*u
: UMD* DUP>R UM* DROP SWAP R> UM* ROT + ;
MD*
d1 n -- d2 signed double product d1*n
: MD*
2DUP XOR >R
ABS -ROT DABS ROT
UMD*
R> 0< IF DNEGATE THEN ;
UM/MOD
ud u1 -- u2 u3 unsigned remainder and quotient ud/u1; the result is undefined when u1 = 0
SM/REM
d1 n1 -- n2 n3 symmetric remainder and quotient d1/n1 rounded towards zero; the result is undefined when n1 = 0
: SM/REM
2DUP XOR >R
OVER >R
ABS -ROT DABS ROT
UM/MOD
R> 0< IF SWAP NEGATE SWAP THEN
R> 0< IF NEGATE THEN ;
FM/MOD
d1 n1 -- n2 n3 floored signed modulus and quotient d1/n1 rounded towards negative (floored); the result is undefined when n1 = 0
: FM/MOD
DUP>R
SM/REM
DUP 0< IF
SWAP R> + SWAP 1-
ELSE
RDROP
THEN ;
/MOD
n1 n2 -- n3 n4 symmetric remainder and quotient n1/n2; the result is undefined when n2 = 0
: /MOD SWAP S>D ROT SM/REM ;
MOD
n1 n2 -- n3 symmetric remainder of n1/n2; the result is undefined when n2 = 0
: MOD /MOD DROP ;
/
n1 n2 -- n3 quotient n1/n2; the result is undefined when n2 = 0
: / /MOD NIP ;
*/MOD
n1 n2 n3 -- n4 n5 product with symmetric remainder and quotient n1*n2/n3; the result is undefined when n3 = 0
: */MOD -ROT M* ROT SM/REM ;
*/
n1 n2 n3 -- n4 product with quotient n1*n2/n3; the result is undefined when n3 = 0
: */ */MOD NIP ;
M*/
d1 n1 n2 -- d2 double product with quotient d1*n1/n2; the result is undefined when n2 = 0
: M*/ >R MD* R> SM/REM NIP ;
AND
x1 x2 -- x1&x2 bitwise and x1 with x2
OR
x1 x2 -- x1|x2 bitwise or x1 with x2
XOR
x1 x2 -- x1^x2 bitwise xor x1 with x2
=
x1 x2 -- flag true if x1 = x2
<>
x1 x2 -- flag true if x1 <> x2
<
n1 n2 -- flag true if n1 < n2 signed
: <
2DUP XOR 0< IF
DROP 0<
EXIT
THEN
- 0< ;
>
n1 n2 -- flag true if n1 > n2 signed
: > SWAP < ;
U<
u1 u2 -- flag true if u1 < u2 unsigned
: U<
2DUP XOR 0< IF
NIP 0<
EXIT
THEN
- 0< ;
U>
u1 u2 -- flag true if u1 > u2 unsigned
: U> SWAP U< ;
0=
x -- flag true if x = 0
0<
n -- flag true if n < 0
D0=
dx -- flag true if dx = 0
: D0= OR 0= ;
D0<
d -- flag true if d < 0
: D0< NIP 0< ;
S>D
n -- d widen single to double
D>S
d -- n narrow double to single; may throw -11 "result out of range" valid range is -32768 to 65535
MAX
n1 n2 -- n3 signed max of n1 and n2
: MAX
2DUP < IF SWAP THEN
DROP ;
MIN
n1 n2 -- n3 signed min of n1 and n2
: MIN
2DUP > IF SWAP THEN
DROP ;
UMAX
u1 u2 -- u3 unsigned max of u1 and u2
: UMAX
2DUP U< IF SWAP THEN
DROP ;
UMIN
u1 u2 -- u3 unsigned min of u1 and u2
: UMIN
2DUP U> IF SWAP THEN
DROP ;
WITHIN
x1 x2 x3 -- flag true if x1 is within x2 up to x3 exclusive
: WITHIN OVER - >R - R> U< ;
INVERT
x1 -- x2 one's complement ~x1
: INVERT 1+ NEGATE ;
: INVERT -1 XOR ;
NEGATE
n1 -- n2 two's complement -n1
: NEGATE 0 SWAP - ;
: NEGATE INVERT 1+ ;
ABS
n1 -- n2 absolute value |n1|
: ABS DUP 0< IF NEGATE THEN ;
DNEGATE
d1 -- d2 two's complement -d1
: DNEGATE SWAP INVERT SWAP INVERT 1 M+ ;
DABS
d1 -- d2 absolute value |d1|
: DABS DUP 0< IF DNEGATE THEN ;
LSHIFT
x1 u -- x2 logical shift left x1 << u
RSHIFT
x1 u -- x2 logical shift right x1 >> u
1+
n1 -- n2 increment n1+1
: 1+ 1 + ;
2+
n1 -- n2 increment n1+2
: 2+ 2 + ;
1-
n1 -- n2 decrement n1-1
: 1- 1 - ;
2-
n1 -- n2 decrement n1-2
: 2- 2 - ;
2*
n1 -- n2 arithmetic shift left n1 << 1
: 2* 2 * ;
2/
n1 -- n2 arithmetic shift right n1 >> 1
: 2/ 2 / ;
COUNT
c-addr1 -- c-addr2 u convert counted string to string
: COUNT DUP 1+ SWAP C@ ;
COMPARE
c-addr1 u1 c-addr2 u2 -- -1|0|1 compare strings, leaves -1 = less or 0 = equal or 1 = greater
S=
c-addr1 u1 c-addr2 u2 -- flag true if strings match
: S= COMPARE 0= ;
SEARCH
c-addr1 u1 c-addr2 u2 -- c-addr3 u3 flag true if the second string is in the first; leaves matching address, remaining length and true; or leaves the first string and false
CMOVE
c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2 (from begin)
: CMOVE
SWAP >R
BEGIN DUP WHILE
NEXT-CHAR R@ C!
R> 1+ >R
REPEAT
RDROP
2DROP ;
CMOVE>
c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2 up (from end)
MOVE
c-addr1 c-addr2 u -- move u bytes from c-addr1 to c-addr2
: MOVE
-ROT
2DUP U< IF
ROT CMOVE>
ELSE
ROT CMOVE
THEN ;
FILL
c-addr u char -- fill memory with char
ERASE
c-addr u -- fill memory with zeros
: ERASE 0 FILL ;
BLANK
c-addr u -- fill memory with 0x20 (BL) chars
: ERASE BL FILL ;
CHOP
c-addr u1 char -- c-addr u2 truncate a string up to a matching char; leaves the string if char not found; char = 0x20 (BL) chops 0x00 to 0x20 (white space and control)
TRIM
c-addr1 u1 char -- c-addr2 u2 trim initial chars from a string; char = 0x20 (BL) trims 0x00 to 0x20 (white space and control)
-TRIM
c-addr u1 char -- c-addr u2 trim trailing chars from a string; char = 0x20 (BL) trims 0x00 to 0x20 (white space and control)
-TRAILING
c-addr u1 -- c-addr u2 trim trailing white space and control characters from a string
: -TRAILING BL -TRIM ;
/STRING
c-addr1 u1 n -- c-addr2 u2 slice n characters off the start of a string
: /STRING ROT OVER + -ROT - ;
NEXT-CHAR
c-addr1 u1 -- c-addr2 u2 char get next char from a string; increments the string address and decrements its length by one
: NEXT-CHAR OVER C@ >R 1- SWAP 1+ SWAP R> ;
: NEXT-CHAR OVER C@ -ROT 1- SWAP 1+ SWAP ROT ;
X!
u -- set cursor column 0 to 23
Y!
u -- set cursor row 0 to 5
X@
-- u fetch cursor column 0 to 23, or 24 when beyond the right window edge
Y@
-- u fetch cursor row 0 to 5
AT-XY
u1 u2 -- set column x to u1 (0 to 23) and row y to u2 (0 to 5)
: AT-XY Y! X! ;
EMIT
char -- emit char to screen; supports the following control codes: 8 (BS backspace, cursor left), 9 (TAB), 10 (LF line feed), 11 (VT scroll), 12 (FF clear screen), 13 (CR carriage return), 28 (cursor right), 29 (cursor left), 30 (cursor up), 31 (cursor down)
TYPE
c-addr u -- type string to output; string may contain control codes, see EMIT
: TYPE
BEGIN DUP WHILE
NEXT-CHAR EMIT
REPEAT
2DROP ;
CR
-- carriage return and line feed
: CR $A EMIT ;
SPACE
-- emit a space (BL)
: SPACE BL EMIT ;
SPACES
n -- emit n spaces (zero or negative n does nothing)
: SPACES
DUP 0< IF
DROP
EXIT
THEN
0 ?DO SPACE LOOP ;
PAGE
-- clear screen
: PAGE $C EMIT ;
BASE
-- addr variable with numeric base for conversion
VARIABLE BASE
DECIMAL
-- set BASE to 10
: DECIMAL #10 BASE ! ;
HEX
-- set BASE to 16
: HEX #16 BASE ! ;
HP
-- addr hold pointer
0 VALUE HP
<#
-- begin pictured numeric output
: <# HERE h_size + TO HP ;
HOLD
char -- hold char for pictured numeric output
: HOLD HP 1- DUP TO HP C! ;
ud1 -- ud2 hold digit
: #
0 BASE @ UM/MOD >R
BASE @ UM/MOD
SWAP DUP #9 > IF
#7 +
THEN
'0 + HOLD
R> ;
#S
ud -- 0 0 hold all remaining digits
: #S BEGIN # 2DUP D0= UNTIL ;
SIGN
n -- hold minus sign if n < 0
: SIGN 0< IF '- HOLD THEN ;
#>
ud -- c-addr u end pictured numeric output, leave string
: #> 2DROP HP HERE h_size + OVER - ;
D.R
d +n -- output signed double d right aligned in field of +n chars wide
: D.R -ROT TUCK DABS <# #S ROT SIGN #> ROT OVER - SPACES TYPE ;
D.
d -- output signed double d with a trailing space
: D. 0 D.R SPACE ;
U.R
u +n -- output unsigned u right aligned in field of +n chars wide
: U.R 0 SWAP D.R ;
U.
u -- output unsigned u with a trailing space
: U. 0 D. ;
.R
n +n -- output signed n right aligned in field of +n chars wide
: .R SWAP S>D ROT D.R ;
.
n -- output signed n with a trailing space
: . S>D D. ;
?
addr -- output signed cell stored at addr
: ? @ . ;
OUT
u1 u2 -- output byte u1 to port u2
INP
u1 -- u2 input from port u1
DRAW
c-addr u -- draw pixel patterns on screen at xy; writes string c-addr u of pixel patterns at xy; specify xy with AT-XY, xy not changed after DRAW
VIEW
c-addr u -- view screen pixels at xy; read string of screen pixel patterns at xy into buffer c-addr u specify xy with AT-XY, xy not changed after VIEW
REVERSE
+n -- reverse video of the +n characters displayed at xy; specify xy with AT-XY
INKEY
-- x check for key press and read key code of a key is pressed; 0x00 = no key pressed and 0x52 = multiple keys pressed
GETKEY
-- char wait and read key; leaves ASCII char or special key code: ON =$05, BS =$08, DEL =$09, CA =$0b, CLS =$0c, ENTER =$0d, DIGIT =$0e, F-E =$0f, INS =$12, ANS =$15, CONST =$17, RCM =$19, M+ =$1a, M- =$1b, right =$1c, left =$1d, up =$1e, down =$1f; a space is produced for the TAB key by the GETCHR system call, calc keys and BASIC keys produce BASIC tokens as key code $fe: SIN =$fe register B = $95 BASIC token for SIN (ignored)
KEY
-- char display cursor and wait to read key; same as GETKEY leaves ASCII char or special key code
EDIT
c-addr +n1 n2 n3 n4 -- c-addr +n5 edit buffer c-addr; buffer size +n1; string in buffer has length n2; place cursor at n3; non-editable left margin n4; leaves c-addr and length +n5
ACCEPT
c-addr +n1 -- +n2 accept user input into buffer c-addr +n1; leaves length +n2
: ACCEPT 0 0 0 EDIT NIP ;
>IN
-- addr variable with offset into input buffer (TIB)
VARIABLE >IN
SOURCE-ID
-- 0|-1 value with 0 = source input or -1 = string input
0 VALUE SOURCE-ID
SOURCE
-- c-addr u double value with input source
TIB 0 2VALUE SOURCE
REFILL
-- flag attempt to refill the input buffer; leaves false when end of input
SKIPS
char "
: SKIPS SOURCE >IN @ /STRING ROT TRIM DROP SOURCE DROP - >IN ! ;
PARSE
char "ccc
: PARSE SOURCE >IN @ /STRING ROT CHOP DUP 1+ >IN @ + SOURCE NIP UMIN >IN ! ;
PARSE-WORD
char "
: PARSE-WORD
DUP SKIPS PARSE
DUP tmp_size-1 U> IF -18 THROW THEN ;
CHECK-NAME
c-addr u -- c-addr u check if name is valid; may throw -16 "attempt to use a zero-length string as a name"; may throw -19 "definition name too long"
: CHECK-NAME
DUP 0= IF -16 THROW THEN
DUP length_bits U> IF -19 THROW THEN ;
PARSE-NAME
"
: PARSE-NAME BL PARSE-WORD CHECK-NAME ;
(
"ccc
: (
') PARSE
BEGIN
+ DROP
SOURCE + = IF
DROP REFILL
ELSE
C@ ') <> IF
REFILL
ELSE
FALSE
THEN
THEN
0= UNTIL ; IMMEDIATE
\
"ccc
: \ $A PARSE 2SROP ;
.(
"ccc
: .( ') PARSE CR TYPE ; IMMEDIATE
>DIGIT
char -- n convert char digit to numeric digit when within BASE; leaves -1 if char is invalid
>NUMBER
ud1 c-addr1 u1 -- ud2 c-addr2 u2 convert string to number; updates accumulated double ud1 to ud2; leaves string with the remaining unconvertable chars or empty
: >NUMBER
BEGIN DUP WHILE
NEXT-CHAR >DIGIT
DUP 0< IF
DROP -1 /STRING
EXIT
THEN
>R
2SWAP
BASE @ UMD*
R> M+
2SWAP
REPEAT ;
DBL
-- flag true if >DOUBLE or NUMBER produced a double
0 VALUE DBL
>DOUBLE
c-addr u -- d true | false convert string to signed double; leaves the double and true if string is converted; leaves false if string is unconvertable
L>NAME
lfa -- nt convert link field address to name token (nfa)
NAME>STRING
nt -- c-addr u convert name token (nfa) to string
NAME>
nt -- xt convert name token (nfa) to execution token (cfa)
>NAME
xt -- nt convert execution token (cfa) to name token (lfa); may throw -24 "invalid numeric argument"
>BODY
xt -- pfa convert execution token to parameter field address
FIND-WORD
c-addr u -- c-addr 0 | xt 1 | xt -1 search dictionary for matching word; leaves execution token and 1 = immediate or -1 = not immediate; leaves c-addr and 0 when not found
'
"
: ' PARSE-NAME FIND-WORD 0= IF -13 THROW THEN ;
WORDS
-- display context vocabulary words
HERE
-- addr address of free memory after the dictionary; new definitions are added here; note that numeric output words use HERE for conversion
LASTXT
-- xt leaves the last execution token defined
0 VALUE LASTXT
STATE
-- addr compilation state; STATE @ leaves TRUE when compiling; STATE @ leaves FALSE when interpreting
VARIABLE STATE
[
-- switch state to interpreting
: [ STATE OFF ;
]
-- switch state to compiling
: ] STATE ON ;
HIDE
-- hide the last definition
: HIDE CURRENT @ L>NAME DUP C@ smudge_bits OR SWAP C! ;
REVEAL
-- reveal the last definition
: REVEAL CURRENT @ L>NAME DUP C@ ~smudge_bits AND SWAP C! ;
IMMEDIATE
-- make the last definition immediate
: IMMEDIATE CURRENT @ L>NAME DUP C@ immediate_bits OR SWAP C! ;
?COMP
-- check if compiling; may throw -14 "interpreting a compile-only word"
?SYS
-- ; C: x -- check if compiled control structure matches x; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
UNUSED
-- u unused dictionary space
: UNUSED top @ HERE - ;
ALLOT
n -- allocate n bytes starting from HERE in the dictionary; undo the last ALLOT with negative n to reclaim memory (only do this when no new words are defined); may throw -8 "dictionary overflow"
COMPILE,
xt -- append execution token to dictionary; may throw -8 "dictionary overflow"
: COMPILE, , ;
,
x -- append cell to dictionary; may throw -8 "dictionary overflow"
C,
char -- append char to dictionary; may throw -8 "dictionary overflow"
2,
x1 x2 -- append double cell to dictionary; may throw -8 "dictionary overflow"
: 2, , , ;
NFA,
"
: NFA, PARSE-NAME HERE CURRENT @ , CURRENT ! DUP C, HERE SWAP DUP ALLOT CMOVE HERE TO LASTXT ;
CFA,
addr -- append cfa call addr to dictionary; may throw -8 "dictionary overflow"
CFA:,
-- addr colon_sys append cfa colon definition to dictionary; make CONTEXT the CURRENT vocabulary; start compiling; may throw -8 "dictionary overflow"
: CFA:, ] HERE colon_sys ['] (:) CFA, CURRENT TO CONTEXT ;
POSTPONE
"
:
-- ; C: "
: : NFA, HIDE CFA:, ;
;
-- ; C: addr colon_sys -- end colon definition and stop compiling; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
: ; ?COMP colon_sys <> IF -22 THROW THEN DROP POSTPONE (;) REVEAL [ ; IMMEDIATE
EXIT
-- exit colon definition
: EXIT ?COMP POSTPONE (EXIT) ; IMMEDIATE
CREATE
"
: NFA, ['] (VAR) CFA, ;
DOES>
-- ; ... -- ... change CREATE name behavior to execute code after DOES>
: DOES> ?COMP POSTPONE (;CODE) ['] (DOES) CFA, ; IMMEDIATE
VARIABLE
"
: VARIABLE CREATE 0 , ;
2VARIABLE
"
: 2VARIABLE CREATE 0 0 2, ;
CONSTANT
x "
: CONSTANT NFA, ['] (CON) CFA, , ;
: CONSTANT CREATE , DOES> @ ;
2CONSTANT
x1 x2 "
: 2CONSTANT NFA, ['] (2CON) CFA, 2, ;
: 2CONSTANT CREATE 2, DOES> 2@ ;
VALUE
x "
: VALUE NFA, ['] (VAL) CFA, , ;
2VALUE
dx "
: 2VALUE NFA, ['] (2VAL) CFA, 2, ;
TO
"
: TO
'
DUP VALUE? IF
>BODY
STATE @ IF
POSTPONE (TO)
,
EXIT
THEN
!
EXIT
THEN
DUP 2VALUE? IF
>BODY
STATE @ IF
POSTPONE (2TO)
,
EXIT
THEN
2!
EXIT
THEN
#-32 THROW ; IMMEDIATE
+TO
"
: +TO
'
DUP VALUE? IF
>BODY
STATE @ IF
POSTPONE (+TO)
,
EXIT
THEN
+!
EXIT
THEN
#-32 THROW ; IMMEDIATE
DEFER
"
: DEFER NFA, ['] (DEF) CFA, ['] UNDEF , ;
UNDEF
-- throw -256 "execution of an uninitialized deferred word"
: UNDEF -256 THROW ;
DEFER!
xt1 xt2 -- store xt1 in deferred xt2
: DEFER! >BODY ! ;
DEFER@
xt1 -- xt2 fetch execution token from deferred xt1
: DEFER@ >BODY @ ;
IS
xt "
: IS
'
DUP DEFER? IF
STATE @ IF
LITERAL
POSTPONE DEFER!
EXIT
THEN
DEFER!
EXIT
THEN
#-32 THROW ; IMMEDIATE
ACTION-OF
"
: ACTION-OF
'
DUP DEFER? IF
STATE @ IF
LITERAL
POSTPONE DEFER@
EXIT
THEN
DEFER@
EXIT
THEN
#-32 THROW ; IMMEDIATE
LITERAL
x -- ; -- x compile a literal
: LITERAL ?COMP POSTPONE (LIT) , ; IMMEDIATE
2LITERAL
x1 x2 -- ; -- x1 x2 compile a double literal
: 2LITERAL ?COMP POSTPONE (2LIT) 2, ; IMMEDIATE
SLITERAL
c-addr u -- ; -- c-addr u compile a string literal; max literal string length is 255
: SLITERAL
?COMP
DUP 255 U> IF -18 THROW THEN
POSTPONE (SLIT)
DUP C,
HERE OVER ALLOT SWAP CMOVE ; IMMEDIATE
."
"ccc" -- ; --
type "ccc" (compiled)
: ." '" PARSE SLITERAL POSTPONE TYPE ; IMMEDIATE
S"
"ccc" -- ; -- c-addr u
leave string "ccc" (compiled and interpreted)
: S"
'" PARSE
STATE @ IF
SLITERAL
EXIT
THEN
TMP SWAP
2DUP 2>R
CMOVE
2R> ; IMMEDIATE
VALUE?
xt -- flag true if xt is a VALUE
: VALUE? DUP C@ $CD = SWAP 1+ @ ['] (VAL) = AND ;
2VALUE?
xt -- flag true if xt is a 2VALUE
: 2VALUE? DUP C@ $CD = SWAP 1+ @ ['] (2VAL) = AND ;
DEFER?
xt -- flag true if xt is a DEFER word
: DEFER? DUP C@ $CD = SWAP 1+ @ ['] (DEF) = AND ;
FENCE
-- addr only permit FORGET past the dictionary FENCE address
0 VALUE FENCE
FORGET
"
[']
"
: ['] ?COMP ' LITERAL ; IMMEDIATE
RECURSE
... -- ... recursively call the currently defined word; may throw -14 "interpreting a compile-only word"
: RECURSE ?COMP LASTXT COMPILE, ; IMMEDIATE
?STACK
-- check parameter stack bounds; may throw -3 "stack overflow"; may throw -4 "stack underflow"
(UNTIL)
x -- branch if x = 0; runtime of the UNTIL compile-only word
(IF)
x -- branch if x = 0; runtime of the IF and WHILE compile-only words
(AGAIN)
-- branch; runtime of the AGAIN and REPEAT compile-only words
(AHEAD)
-- branch; runtime of the AHEAD, ELSE and ENDOF compile-only words
(OF)
x1 x2 -- x1 or x1 x2 -- branch if x1 <> x2; runtime of the OF compile-only word
(LOOP)
-- repeat loop unless loop counter crosses the limit; runtime of the LOOP compile-only word
(+LOOP)
-- increment counter and repeat loop unless counter crosses the limit; runtime of the +LOOP compile-only word
(?DO)
n1|u1 n2|u2 -- begin loop with limit n1|u1 and initial value n2|u2; skip loop when zero trip loop; runtime of the ?DO compile-only word
(DO)
n1|u1 n2|u2 -- begin loop with limit n1|u1 and initial value n2|u2; loop at least once; runtime of the DO compile-only word
(UNLOOP)
R: ... -- remove loop parameters; runtime of the UNLOOP compile-only word
(LEAVE)
-- discard the loop parameters and exit the innermost do-loop; runtime of the LEAVE compile-only word
AHEAD
-- ; C: -- addr orig branch ahead to THEN; may throw -14 "interpreting a compile-only word"
BEGIN
-- ; C: -- addr dest begin WHILE REPEAT; may throw -14 "interpreting a compile-only word"
AGAIN
-- ; C: addr dest -- branch back to BEGIN; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
UNTIL
x -- ; C: addr dest -- branch back to BEGIN if x = 0; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
IF
x -- ; C: -- addr orig branch to closest ELSE or THEN if x = 0; may throw -14 "interpreting a compile-only word"
THEN
-- ; C: addr orig -- close AHEAD, IF, ELSE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
ELSE
-- ; C: addr orig -- addr orig close IF and branch to THEN; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
WHILE
x -- ; C: addr sys -- addr orig addr sys branch to exit REPEAT if x = 0; may throw -14 "interpreting a compile-only word"
REPEAT
-- ; C: addr orig addr dest -- branch back to BEGIN after WHILE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
DO
n1|u1 n2|u2 -- ; C: -- addr do_sys begin loop from initial value n2|u2 to the limit n1|u1; loop at least once; may throw -14 "interpreting a compile-only word"
?DO
n1|u1 n2|u2 -- ; C: -- addr do_sys begin loop from initial value n2|u2 to the limit n1|u1; skip loop when zero trip loop; may throw -14 "interpreting a compile-only word"
LOOP
-- ; C: addr do_sys -- repeat loop unless loop counter crosses the limit; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
+LOOP
n|u -- ; C: addr do_sys -- increment counter and repeat loop unless counter crosses the limit; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
UNLOOP
-- remove loop parameters; may throw -14 "interpreting a compile-only word"
LEAVE
-- exit the innermost do-loop; may throw -14 "interpreting a compile-only word"
I
-- n counter of innermost do loop
J
-- n counter of outer (second) do loop
CASE
x -- ; C: -- 0 begin CASE ENDCASE switch; may throw -14 "interpreting a compile-only word"
OF
x1 x2 -- x1 or x1 x2 -- ; C: n1 -- orig n2 take CASE arm if x1 = x2; otherwise branch to next OF; may throw -14 "interpreting a compile-only word"
ENDOF
-- ; C: n -- orig n branch to ENDCASE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
ENDCASE
x -- ; C: n*orig n -- close CASE; may throw -14 "interpreting a compile-only word"; may throw -22 "control structure mismatch"
HANDLER
-- addr variable with saved return stack pointer
VARIABLE HANDLER
EXECUTE
... xt -- ... execute execution token xt
CATCH
... xt -- ... 0 or xt -- n execute xt leaving nonzero exception code n or 0 when no exception occurred; when an exception was caught, the parameter and return stacks are restored to their state before execution of xt
: CATCH
SP@ >R
HANDLER @ >R
RP@ HANDLER !
EXECUTE
R> HANDLER !
RDROP
0 ;
THROW
0 -- or ... n -- ... n throw exception n if nonzero
: THROW
?DUP IF
HANDLER @ ?DUP IF
RP!
R> HANDLER !
R> SWAP >R
SP!
DROP
R>
EXIT
THEN
>R CLEAR R>
ERROR
REPL
THEN ;
QUIT
... -- ; R: ... -- throw -56 "QUIT"; no exception error is displayed; unlike ABORT, the parameter stack is not cleared
: QUIT -56 THROW ;
(ABORT")
... flag c-addr u -- ; R: ... -- if flag then abort with string message unless an active catch is present; runtime of the ABORT" compile-only word; throw -2 "ABORT""
: (ABORT")
ROT IF
HANDLER @ IF
2DROP
ELSE
TYPE
THEN
-2 THROW
THEN
2DROP ;
ABORT"
... flag -- ; C: "ccc" -- ; R: ... --
if flag then abort with string message unless an active catch is present;
throw -2 "ABORT"";
clears the parameter stack unless caught with CATCH;
may throw -14 "interpreting a compile-only word"
: ABORT" ?COMP POSTPONE S" POSTPONE (ABORT") ; IMMEDIATE
ABORT
... -- ; R: ... -- throw -1 "ABORT"; clears the parameter stack unless caught with CATCH
: ABORT -1 THROW ;
ERROR
n -- display exception n at the offending location in the input; n = -1 ABORT and n = -2 ABORT" clear the stack; n = -56 QUIT stays silent; List of Forth850 errors:
code | error |
---|---|
-1 | ABORT |
-2 | ABORT" |
-3 | stack overflow |
-4 | stack underflow |
-8 | dictionary overflow |
-10 | division by zero |
-11 | result out of range |
-13 | undefined word |
-14 | interpreting a compile-only word |
-15 | invalid FORGET |
-16 | attempt to use zero-length string as a name |
-18 | parsed string overflow |
-19 | definition name too long |
-22 | control structure mismatch |
-24 | invalid numeric argument |
-28 | user interrupt (BREAK was pressed) |
-32 | invalid name argument (invalid TO name) |
-42 | floating-point divide by zero |
-43 | floating-point result out of range |
-46 | floating-point invalid argument |
-56 | QUIT |
-256 | execution of an uninitialized deferred word |
NUMBER
c-addr u -- n|u|d|ud convert string to number; value DBL is set to -1 when the number is a double; may throw -13 "undefined word" when string is not numeric
INTERPRET
-- interpret input while input is available
EVALUATE
... c-addr u -- ... evaluate string
REPL
-- read-evaluate-print loop
: REPL
rp0 @ RP!
HANDLER OFF
0 TO SOURCE-ID
CR
[
BEGIN
BEGIN ['] REFILL CATCH ?DUP WHILE
ERROR CR
REPEAT
WHILE
SPACE
['] INTERPRET CATCH ?DUP IF
ERROR
REPL
THEN
STATE @ INVERT IF
." OK["
DEPTH 0 U.R
'] EMIT
THEN
CR
REPEAT
BYE ;
BYE
-- return to BASIC
CONTEXT
-- addr leaves address of link of the last vocabulary context definition
' FORTH VALUE CONTEXT
CURRENT
-- addr leaves address of link of the last current vocabulary definition
' FORTH VALUE CURRENT
DEFINITIONS
-- make CURRENT the CONTEXT vocabulary
: DEFINITIONS CONTEXT TO CURRENT ;
VOCABULARY
"
: VOCABULARY CREATE , fig_kludge , DOES> TO CONTEXT ;
FORTH
-- make FORTH the CONTEXT vocabulary
VOCABULARY FORTH
Additional words included with the full version
FALSE
-- 0 leave 0
0 CONSTANT FALSE
TRUE
-- -1 leave -1
-1 CONSTANT TRUE
2ROT
xd1 xd2 xd3 -- xd2 xd3 xd1 rotate double cells
: 2ROT 5 ROLL 5 ROLL ;
ROLL
xu x(u+1) ... x1 x0 u -- x(u+1) ... x1 x0 xu roll u cells on the parameter stack; 0 ROLL does nothing; 1 ROLL is the same as SWAP; 2 ROLL is the same as ROT
D*
d1|ud1 d2|ud2 -- d3|ud3 signed and unsigned double product d1*d2
: D* >R ROT DUP>R -ROT MD* 2R> * 0 SWAP D+ ;
UD/MOD
ud1 ud2 -- ud3 ud4 unsigned double remainder and quotient ud1/ud2; the result is undefined when ud2 = 0
D/MOD
d1 d2 -- d3 d4 double symmetric remainder and quotient d1/d2; the result is undefined when d2 = 0
: D/MOD
DUP 3 PICK DUP>R XOR >R
DABS 2SWAP DABS 2SWAP
UD/MOD
R> 0< IF DNEGATE THEN
R> 0< IF 2SWAP DNEGATE 2SWAP THEN ;
DMOD
d1 d2 -- d3 double symmetric remainder of d1/d2; the result is undefined when d2 = 0
: DMOD D/MOD 2DROP ;
D/
d1 d2 -- d3 double quotient d1/d2; the result is undefined when d2 = 0
: D/ D/MOD 2SWAP 2DROP ;
D=
d1 d2 -- flag true if d1 = d2
: D= D- D0= ;
D<
d1 d2 -- flag true if d1 < d2
: D<
DUP 3 PICK XOR 0< IF
2DROP D0<
EXIT
THEN
D- D0< ;
DU<
du1 du2 -- flag true if ud1 < ud2
: DU<
DUP 3 PICK XOR 0< IF
2SWAP 2DROP D0<
EXIT
THEN
D- D0< ;
DMAX
d1 d2 -- d3 signed double max of d1 and d2
: DMAX
2OVER 2OVER D< IF 2SWAP THEN
2DROP ;
DMIN
d1 d2 -- d3 signed double min of d1 and d2
: DMIN
2OVER 2OVER D< INVERT IF 2SWAP THEN
2DROP ;
CELL+
addr -- addr increment to next cell
: CELL+ 2+ ;
CELLS
n1 -- n2 convert to cell unit
: CELLS 2* ;
CHAR+
n1 -- n1 increment to next char
: CHAR+ 1+ ;
CHARS
n1 -- n2 convert to char unit
: CHARS ;
DUMP
c-addr u -- dump memory in hex
: DUMP
BASE @ >R
HEX
BEGIN DUP WHILE
NEXT-CHAR .
REPEAT
2DROP
R> BASE ! ;
HOLDS
c-addr u -- hold string for pictured numeric output
: HOLDS
BEGIN DUP WHILE
1- 2DUP + C@ HOLD
REPEAT
2DROP ;
BEEP
-- sound the speaker for a short ~2KHz beep
KEY-CLEAR
-- wait until no keys are pressed
: KEY-CLEAR BEGIN INKEY 0= UNTIL ;
KEY?
-- flag true if a key is pressed
: KEY? INKEY 0= 0= ;
WORD
char "
: WORD TMP DUP ROT PARSE-WORD ROT 2DUP C! 1+ SWAP CMOVE ;
CHAR
"
: CHAR PARSE-NAME DROP C@ ;
FIND
c-addr -- c-addr 0 | xt 1 | xt -1 search dictionary for counted string; see WORD, COUNT and FIND-WORD
BUFFER:
n "
: BUFFER: CREATE ALLOT ;
:NONAME
-- xt colon definition without name; leaves execution token of definition to be used or saved
C"
"ccc" -- ; -- c-addr
leave counted string "ccc" (compiled);
may throw -18 "parsed string overflow"
: C" ?COMP POSTPONE S" POSTPONE DROP POSTPONE 1- ;
MARKER?
xt -- flag true if xt is a MARKER word
MARKER
"
: MARKER
CURRENT
DUP @
HERE
CREATE
, 2,
DOES>
DUP CELL+ 2@
SWAP TO CONTEXT
DUP CONTEXT !
DEFINITIONS
L>NAME NAME> TO LASTXT
@ HERE - ALLOT ;
ANEW
"
: ANEW
>IN @ >R
PARSE-NAME FIND-WORD
OVER MARKER?
AND IF
EXECUTE
ELSE
DROP
R> >IN !
MARKER ;
[CHAR]
"
: [CHAR] ?COMP CHAR LITERAL ; IMMEDIATE
[COMPILE]
"
: [COMPILE] ?COMP ' COMPILE, ; IMMEDIATE
K
-- n counter of outer (third) do loop
TEXT
-- read and evaluate TEXT editor area with Forth source code; caveat: .( and ( in TEXT cannot span more than one line, they end at EOL
: TEXT
$7973 @ 1+ >R
BEGIN
R> \ -- addr
DUP C@ $FF <> WHILE
2+ DUP C@ SWAP 1+ \ -- len addr
2DUP + >R
SWAP 1- EVALUATE
REPEAT
DROP ;
Floating point math words included with the full version
Floating point values are doubles on the stack. Double words, such as 2DUP, can be used to manipulate floats. Floats can be stored in 2CONSTANT, 2VARIABLE and 2VALUE assigned with TO (but not with +TO.)
Beware that HEX prevents inputting floats and garbles the output of floats.
F+
r1 r2 -- r3 sum r1+r2; may throw -43 "floating-point result out of range"
F-
r1 r2 -- r3 difference r1-r2; may throw -43 "floating-point result out of range"
F*
r1 r2 -- r3 product r1*r2; may throw -43 "floating-point result out of range"
F/
r1 r2 -- r3 quotient r1/r2 may throw -42 "floating-point divide by zero"; may throw -43 "floating-point result out of range"
FTRUNC
r1 -- r2 truncate float towards zero
FLOOR
r1 -- r2 floor float towards negative infinity may throw -43 "floating-point result out of range"
FROUND
r1 -- r2 round float to nearest; may throw -43 "floating-point result out of range"
FNEGATE
r1 -- r2 negate float
FABS
r1 -- r2 absolute value |r1|
: FABS 2DUP F0< IF FNEGATE THEN ;
F=
r1 r2 -- flag true if r1 = r2
: F= D= ; ( works for IEEE 754 floating point without negative zero and inf/nan )
F<
r1 r2 -- flag true if r1 < r2
: F<
DUP 3 PICK AND 0< IF
2SWAP
D< ; ( works for IEEE 754 floating point without negative zero and inf/nan )
F0=
r -- flag true if r = 0.0e0
: F0= D0= ; ( works for IEEE 754 floating point without negative zero and inf/nan )
F0<
r -- flag true if r < 0.0e0
: F0< D0< ; ( works for IEEE 754 floating point without negative zero and inf/nan )
FMAX
r1 r2 -- r3
max of r1 and r2
: FMAX
2OVER 2OVER F< IF 2SWAP THEN
2DROP ;
FMIN
r1 r2 -- r3
min of r1 and r2
: FMIN
2OVER 2OVER F< INVERT IF 2SWAP THEN
2DROP ;
D>F
d -- r widen signed double to float
S>F
n -- r widen signed single to float
F>D
r -- d narrow float to a signed double; may throw -11 "result out of range"
F>S
r -- n narrow float to a signed single; may throw -11 "result out of range"
>FLOAT
c-addr u -- r true | false convert string to float; leaves the float and true if string is converted; leaves false if string is unconvertable
REPRESENT
r c-addr u -- n flag true convert float to string; store decimal digits of the float in the non-empty buffer c-addr u; leaves decimal exponent n+1 and flag = true if negative
PRECISION
-- +n floating point output precision, the number of decimal digits displayed is 7 by default
7 VALUE PRECISION
F.
r -- output float with a trailing space; output fixed notation when 1e-1 <= |r| < 1e+7, otherwise output scientific notation
: F.
HERE PRECISION REPRESENT DROP IF
'- EMIT
THEN
DUP 0 PRECISION 1+ WITHIN IF
HERE OVER TYPE
'. EMIT
HERE OVER +
PRECISION ROT - '0 -TRIM TYPE SPACE
EXIT
THEN
HERE C@ EMIT
'. HERE C!
HERE PRECISION '0 -TRIM TYPE
'E EMIT
1- . ;
Dictionary structure
The Forth850 dictionary is organized as follows:
low address
_________
+--->| $0000 | last link is zero (2 bytes)
^ |---------|
| | 3 | length of "(:)" (1 byte)
| |---------|
| | (:) | "(:)" word characters (3 bytes)
| |---------|
| | code | machine code
| |=========|
+<==>+ link | link to previous entry (2 bytes)
^ |---------|
: : :
: : :
: : :
| |=========|
+<==>| link | link to previous entry (2 bytes)
^ |---------|
| | $80+5 | length of "aword" (1 byte) with IMMEDIATE bit set
| |---------|
| | aword | "aword" word characters (5 bytes)
| |---------|
| | code | Forth code and/or data
| |=========|
+<---| link |<--- last link to previous entry (2 bytes)
|---------|
| 7 | length of "my-word" (1 byte)
|---------|
| my-word | "my-word" word characters (7 bytes)
|---------|
| code |<--- LASTXT points to code (last xt)
|=========|<--- HERE pointer
| hold | hold area for numerical output (40 bytes)
|---------|
| |
| free | unused dictionary space
| space |
| |
|=========|<--- dictionary limit
| |
| data | stack of 256 bytes (128 cells)
| stack | grows toward lower addresses
| |<--- SP stack pointer
|=========|
| |
| return | return stack of 256 bytes (128 cells/calls)
| stack | grows toward lower addresses
| |<--- RP return stack pointer
|_________|<--- USER address
<--- USER+1 address
high address set with USER in Monitor MON
A link field points to the previous link field. The last link field at the lowest address of the dictionary is zero.
LASTXT
returns the execution token of the last definition, which is the
location where the machine code of the last word starts.
Forth850 is a Direct Threaded Code Forth implementation. Code is either
machine code or starts with a jump or call machine code instruction of 3 bytes,
followed by Forth code (a sequence of execution tokens in a colon definition)
or data (constants, variables, values and other words created with CREATE
.)
Immediate words are marked with the length byte high bit 7 set ($80). Hidden
words have the "smudge" bit 6 ($40) set. A word is hidden until successfully
compiled. HIDE
hides the last defined word by setting the smudge bit.
REVEAL
reveals it. Incomplete colon definitions with compilation errors
should never be revealed.
Implementation
The following sections explain parts of the technical implementation of Forth850. I will explain the new Forth system routines, the new Z80 math routines and the string routines.
Forth850 is built with the asz80 assembler and aslink linker.
Z80 Forth system routines
Forth850 uses direct threaded code (DTC). Faster would be to use subroutine threaded code (STC), but this would significantly increase the overall code size and Forth compilation complexity, which are less desirable for a small Z80-based system.
The following Z80 Forth routines are inspired by the article Moving Forth. However, I've decided to use a different Z80 register mapping that is more efficient:
- BC: instruction pointer (IP)
- DE: top of stack (TOS)
- IY: address of the "next routine", for
jp (iy)
By contrast to the article, having the TOS in DE makes it quicker to perform
address arithmetic with the TOS, because we can exchange DE with HL with ex de,hl
in just 4 CPU cycles. Moving BC to HL takes 8 CPU cycles.
I've placed the return stack pointer (RP) in RAM. There is no advantage to use
register IX for RP as the article suggests. In fact, the colon call and return
have the same cycle counts, but almost all of the return stack operations, such
as >R
, require more cycles with the RP in IX compared to the RP in RAM.
A jump to the "next routine" is with jp (iy)
takes 8 CPU cycles, compared to
a jp next
that takes 10 cycles. Inlining the "next routine" eliminates
this overhead, but increases the code size. Inlining should only be applied to
performance-critical words that are frequently used. See macros NEXT
and
JP_NEXT
defined in the section below.
Next fetch and execute
Fetching an execution token (xt) from the instruction pointer (IP) address, incrementing IP and executing the token takes 38 CPU cycles in the "next routine":
.macro NEXT
ld a,(bc) ; 7 ;
ld l,a ; 4 ;
inc bc ; 6 ;
ld a,(bc) ; 7 ;
ld h,a ; 4 ;
inc bc ; 6 ; [ip++] -> hl with xt
jp (hl) ; 4(38); jump to hl
.endif
The "next routine" cycles contribute to the overhead of DTC, which cannot be
removed to speed up DTC execution. To improve speed by 10% on average, the
fetch and execute routine is inlined with the NEXT
macro for
performance-critical words. When performance is not critical, a JP_NEXT
macro is used, which simply expands into jp (iy)
with the IY register
pointing to the "next routine":
.macro JP_NEXT
jp (iy) ; 8(46); jump to next routine
.endm
Colon call and return
Each colon definition in memory starts with a call docol
. The docol
routine associated with the (:)
word saves the instruction pointer in BC on
the return stack and pops the new instruction pointer from the parameter stack
(since call docol
leaves the address after the call on the stack.) The
routine checks for ON/BREAK key and begins executing the colon definition with
the "next routine":
docol: ld hl,(rp) ; 16 ; [rp] -> hl
dec hl ; 6 ;
ld (hl),b ; 7 ;
dec hl ; 6 ;
ld (hl),c ; 7 ; save bc -> [--rp] with caller ip on the return stack
ld (rp),hl ; 16 ; ip - 2 -> rp
pop bc ; 10(68); pop ip saved by call docol
; continue with ON/BREAK key check
cont: in a,(0x1f) ; 11 ; port 0x1f bit 7 is set if ON/BREAK is depressed
add a ; 4 ; test ON/BREAK key
jr c,break ; 7(22); if ON/BREAK pressed then break
; next
next: ld a,(bc) ; 7 ;
ld l,a ; 4 ;
inc bc ; 6 ;
ld a,(bc) ; 7 ;
ld h,a ; 4 ;
inc bc ; 6 ; [bc++] -> hl with xt
jp (hl) ; 4(38); jump to hl
A return from a colon definition with the (;)
word pops the return
instruction pointer off the return stack to continue executing the caller's
next instruction.
doret: ld hl,(rp) ; 16 ; [rp] -> hl
ld c,(hl) ; 7 ;
inc hl ; 6 ;
ld b,(hl) ; 7 ;
inc hl ; 6 ;
ld (rp),hl ; 16(58); restore [rp++] -> bc with ip of the caller
NEXT ; 38 ; continue
A colon call takes 145 cycles (17 + 68 + 22 + 38 cycles) and a colon return takes 96 cycles (58 + 38 cycles.) This includes the 38 cycle overhead of the "next routine" to fetch and execute the next token.
Variables
A Forth variable leaves its address on the parameter stack. A call dovar
is
used to push the address on the stack, which is then retrieved to set the new
TOS:
dovar: pop hl ; 10 ; pop hl with pfa addr saved by call dovar
push de ; 11 ; save TOS
ex de,hl ; 4(25); set new TOS to hl with pfa addr
NEXT ; 38 ; continue
Executing a word defined as a variable takes 80 cycles (17 + 25 + 38 cycles), which includes the "next routine" overhead.
Constants and values
A Forth constant or value leaves its value on the parameter stack. A call doval
is used to push the address of the constant/value on the stack. The
constant/value is then fetched:
doval: pop hl ; 10 ; pop hl with pfa addr saved by call doval
push de ; 11 ; save TOS
ld e,(hl) ; 7 ;
inc hl ; 6 ;
ld d,(hl) ; 7(41); set [hl] -> de as new TOS
NEXT ; 38 ; continue
Executing a word defined as a constant or value takes 96 cycles (17 + 41 + 38 cycles), which includes the "next routine" overhead.
Fetch and store
The @
fetch and !
store words make good use of ex de,hl
:
fetch: ex de,hl ; 4 ; addr -> hl
ld e,(hl) ; 7 ;
inc hl ; 6 ;
ld d,(hl) ; 7(24); set [hl] -> de as new TOS
NEXT ; 38 ; continue
store: pop hl ; 10 ; pop addr -> hl
ex de,hl ; 4 ; x -> de, addr -> hl
ld (hl),e ; 7 ;
inc hl ; 6 ;
ld (hl),d ; 7 ; de -> [hl] with x
pop de ; 10(44); pop new TOS
NEXT ; 38 ; continue
CREATE with DOES>
A Forth definer word that uses CREATE with DOES> to define new words is
compiled to execute the (;CODE)
token with label doscode
, followed by a
call dodoes
to start interpreting the DOES> code:
doscode: ld hl,(lastxt+3) ; LASTXT -> hl with last defined word xt
inc hl ;
ld (hl),c ;
inc hl ;
ld (hl),b ; ip -> [LASTXT+1] overwrite call address
jr doret ; (;) return to caller
dodoes: ld hl,(rp) ; 16 ; [rp] -> hl
dec hl ; 6 ;
ld (hl),b ; 7 ;
dec hl ; 6 ;
ld (hl),c ; 7 ;
ld (rp),hl ; 16 ; save bc -> [--rp] with old ip on the return stack
pop bc ; 10 ; pop bc with new ip of the DOES> routine saved by call dodoes
pop hl ; 10 ; pop pfa addr
push de ; 11 ; save TOS
ex de,hl ; 4(93); set new TOS to hl with pfa addr
NEXT ; 38 ; continue
A word defined by a CREATE/DOES> definer makes a call
to the call dodoes
routine. For example, suppose we define CONSTANT
as follows:
: CONSTANT CREATE , DOES> @ ;
123 CONSTANT X
then CONSTANT
and X
are compiled as:
CONSTANT: call docol
.dw create
.dw comma
.dw doscode
CONSTANT_does: call dodoes
.dw fetch
.dw doret
X: call CONSTANT_does
.dw 123
Executing X
takes 192 cycles (17 + 17 + 24 + 38 + 96 cycles.) When
more optimally defined as a CONSTANT
in code, this takes 96 cycles.
Parsing
Forth words are parsed with my new CHOP
and TRIM
words that efficiently
parse and extract white-space-delimited words from the input.
Entry:
- DE with TOS: a char to truncate the string with
- 2OS: string length u1
- 3OS: string address c-addr
Exit:
- DE with TOS: truncated string length u2
- 2OS: string address c-addr
Performance: 21 cycles per character for non-BL char to chop, 40 cycles per character for BL to chop white space
chop: ld a,e ; char -> a
exx ; save bc with ip
ex af,af' ; save a with char
pop bc ; pop u1 -> bc
ld e,c ;
ld d,b ; u1 -> de
ld a,c ;
or b ; test bc = 0, 0 -> cf
jr z,2$ ; if bc = 0 then not found
pop hl ;
push hl ; c-addr -> hl
ex af,af' ; restore a with char
cp 0x20 ;
jr z,3$ ; if a = 0x20 then find white space
or a ; 0 -> cf not found
; find char in string
cpir ; 21/16 ; repeat until a = [hl++] or --bc = 0
jr nz,2$ ; if match then
1$: ccf ; complement to correct cpi bc--
2$: ex de,hl ; u1 -> hl
sbc hl,bc ; u1 - bc - cf -> hl
push hl ; save hl as TOS
exx ; restore bc with ip
pop de ; pop new TOS
JP_NEXT ; continue
; find white space in string
3$: cp (hl) ; 7 ; loop to compare a to [hl]
cpi ; 16 ; hl++, bc--
jr nc,1$ ; 7 ; if [hl]<a then found
jp pe,3$ ; 10 ; until bc = 0
jr 1$ ; not found
Entry:
- DE with TOS: char to trim the string by removing them from its beginning
- 2OS: string length u1
- 3OS: string address c-addr1
Exit:
- DE with TOS: updated string length u2
- 2OS: updated string address c-addr2
Performance: 33 cycles to trim non-BL char, 106 cycles to trim white space with BL char
trim: ld a,e ; char -> a
exx ; save bc with ip
pop bc ; u1 -> bc
pop hl ; c-addr1 -> hl
1$: ex af,af' ; 4 ; save a
ld a,c ; 4 ;
or b ; 4 ;
jr z,3$ ; 7 ; if bc <> 0 then
ex af,af' ; 4 ; restore a
2$: cpi ; 16 ; loop
jr nz,4$ ; 7/12 ; while a = [hl++], --bc
jp pe,2$ ; 10 ; until b = 0
3$: push hl ; save hl as 2OS
push bc ; save bc as TOS
exx ; restore bc with ip
pop de ; pop new TOS
JP_NEXT ; continue
4$: cp 0x20 ; 7 ;
jr nz,5$ ; 7 ; if char = 0x20 then
dec hl ; 6 ;
cp (hl) ; 7 ;
inc hl ; 6 ;
jr nc,1$ ; 12 ; if [hl-1] <= 0x20 then keep trimming
5$: inc bc ; correct bc++ for cpi match
dec hl ; correct hl-- for cpi match
jr 3$ ; finalize trimming
To parse a white-space-delimited word is efficiently performed with BL PARSE
where the PARSE
word is defined as:
: PARSE ( char "ccc<char>" -- c-addr u )
SOURCE
>IN @ /STRING
ROT CHOP
DUP 1+ >IN @ +
SOURCE NIP UMIN >IN ! ;
To skip input until the next non-white-space character is efficiently performed
with BL SKIPS
, where SKIPS
is defined as:
: SKIPS ( char "<chars>" -- )
SOURCE >IN @ /STRING
ROT TRIM
DROP
SOURCE DROP - >IN ! ;
Dictionary search with case insensitive string matching
The FIND-WORD
word searches the dictionary starting with CONTEXT
for a
matching word. The search is case insensitive. Smudged words are skipped.
Entry:
- DE with TOS: size of the string to search u
- 2OS: address of the string to search c-addr
Exit:
- DE with TOS: 0 = not found, 1 = found immediate, -1 = found (not immediate)
- 2OS: string address if not found or execution token when found
Performance: 95 cycles per dictionary entry, 51 or 102 cycles per character comparison when characters match
findword: ld a,d ;
or a ; test d = 0 high order byte of u
jp nz,zero_next ; if u is too large then set new TOS to 0
sla e ; shift u to compare w/o immediate bit
jp c,zero_next ; if u is too large then set new TOS to 0
jp z,zero_next ; if u = 0 then set new TOS to 0
push de ; save de with 2*u
exx ; save bc with ip
pop bc ; pop 2 * u -> bc
pop de ; pop c-addr -> de
ld hl,(context+3) ; CONTEXT -> hl
jr 3$ ; start searching
; loop over dictionary
1$: pop de ; restore de with c-addr
2$: pop hl ; 10 ; loop, restore hl with lfa
3$: ld a,(hl) ; 7 ;
inc hl ; 6 ;
ld h,(hl) ; 7 ;
ld l,a ; 4 ; [hl] -> hl follow link at hl = lfa
or h ; 4 ;
jr z,6$ ; 7 ; if hl = 0 then not found
push hl ; 11 ; save hl with lfa
inc hl ; 6 ;
inc hl ; 6 ; hl + 2 -> hl with nt (nfa)
ld a,(hl) ; 7 ; word length
add a ; 4 ; shift away immediate bit
cp c ; 4 ; test a = c word length match (both shifted)
jr nz,2$ ; 12(95); if lengths differ then continue searching
; compare string to word
push de ; save de with c-addr
inc hl ; hl++ point to nfa chars
ld b,c ; 2 * u -> b
srl b ; u -> b word length (nonzero)
; loop over word chars
4$: ld a,(de) ; 7 ; loop
cp (hl) ; 7 ; compare [de] = [hl]
jr z,5$ ; 12/7 ; if mismatch then
and 0xdf ; 7 ; make upper case
cp 'A ; 7 ;
jr c,1$ ; 7 ; if a<'A' then continue search
cp 'Z+1 ; 7 ;
jr nc,1$ ; 7 ; if a>'Z' then continue search
xor (hl) ; 7 ;
and 0xdf ; 7 ; case insensitive compare [de] = [hl]
jr nz,1$ ; 7 ; if mismatch then continue search
5$: inc de ; 6 ; de++ point to next char of c-addr
inc hl ; 6 ; hl++ point to next char of word
djnz 4$ ; 13(51/102);until --b = 0
; found a matching word
pop de ; discard saved c-addr
ex (sp),hl ; save hl with xt as 2OS, restore hl with lfa
inc hl ;
inc hl ; hl + 2 -> hl with nt (nfa)
bit immediate_bit,(hl) ; test immediate bit of [hl] word length
exx ; restore bc with ip
jp nz,one_next ; set new TOS to 1 if word is immediate
jp mone_next ; set new TOS to -1
; not found
6$: push de ; save de with c-addr as 2OS
exx ; restore bc with ip
jp zero_next ; set new TOS to 0
JP_NEXT ; continue
Z80 integer math routines
I've written the following Z80 math routines. My objective was to make them as efficient as possible. The second objective was to keep the code size small by using tricks with CPU arithmetic and flags.
Fast signed/unsigned 16x16->16 bit multiplication
Entry:
- BC: signed or unsigned multiplier n1
- DE: signed or unsigned multiplicand n2
Exit:
- HL: signed product or unsigned product n3
Perfomance: max 51 cycles x 16 iterations = 816 cycles or max 51 cycles x 8 iterations + 45 x 8 = 768 cycles, excluding entry/exit overhead
mult1616: ld hl,0 ; 0 -> hl
ld a,c ; c -> a low order byte of n1
ld c,b ; b -> c save high order byte of n1
ld b,8 ; 8 -> b loop counter
1$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,2$ ; 7 ; if cf = 1 then
add hl,de ; 11 ; hl + de -> hl
2$: sla e ; 8 ;
rl d ; 8 ; de << 1 -> de
djnz 1$ ; 13(51); until --b = 0
ld a,c ; c -> a high order byte of n1
ld b,8 ; 8 -> b loop counter
3$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,4$ ; 7 ; if cf = 1 then
add hl,de ; 11 ; hl + de -> hl
4$: sla e ; 8 ;
rl d ; 8 ; de << 1 -> de
djnz 3$ ; 13(51); until --b = 0
ret ; done
We can make an additional speed improvement, which only costs us one more instruction byte. To calculate the high order byte we do not need to iterate over all 8 bits of the high order multiplier stored in register c, but only over the nonzero bits. We also can ignore the lower order result stored in register e. This reduces the max loop iteration cycle time to 32 and 33 per bit. Furthermore, the second loop only runs until the last bit of register c is shifted out. If register c is zero, the second loop does not execute thereby saving hundreds of cycles. We also use jp instead of jr to improve and balance the cycle time per bit:
mult1616: ld hl,0 ; 0 -> hl
ld a,c ; c -> a low order byte of n1
ld c,b ; b -> c save high order byte of n1
ld b,8 ; 8 -> b loop counter
1$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,2$ ; 7 ; if cf = 1 then
add hl,de ; 11 ; hl + de -> hl
2$: sla e ; 8 ;
rl d ; 8 ; de << 1 -> de
djnz 1$ ; 13(51); until --b = 0
ld a,h ; h -> a do high order, low order is done
jr 5$ ; jump to shift c and loop
3$: add d ; 4 ; loop, a + d -> d
4$: sla d ; 8 ; d << 1 -> d
5$: srl c ; 8 ; c >> 1 -> c set cf and z if no bits left
jr c,3$ ; 12/7(32); until cf = 0 repeat with addition
jp nz,4$ ; 10(33); until c = 0 repeat without addition
ret ; done
Note: unrolling the loops would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.
Fast unsigned 16x16->32 bit multiplication
Entry:
- DE: unsigned multiplicand u1
- BC: unsigned multiplier u2
Exit:
- DE: low order unsigned product u3
- HL: high order unsigned product u3
Perfomance: max 64 cycles x 17 iterations = 1088 cycles, excluding entry/exit overhead
umult1632: xor a ; 0 -> cf
ld l,a ;
ld h,a ; 0 -> hl
ld a,17 ; 17 -> a loop counter
1$: rr h ; 8 ; loop
rr l ; 8 ;
rr d ; 8 ;
rr e ; 8 ; hlde + cf >> 1 -> hlde
jr nc,2$ ; 7 ; if cf = 1 then
add hl,bc ; 11 ; hl + bc -> hl
2$: dec a ; 4 ;
jp nz,1$ ; 10(64); until --a = 0
ret ; done
Note: unrolling the loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.
Fast signed/unsigned 32x32->32 bit multiplication
Entry:
- BC': low order signed or unsigned multiplicand d1
- DE: high order signed or unsigned multiplicand d1
- DE': low order signed or unsigned multiplier d2
- HL': high order signed or unsigned multiplier d2
Exit:
- HL': low order signed product or unsigned product d3
- HL: high order signed product or unsigned product d3
Perfomance: max 98 cycles x 32 iterations = 3136 cycles, excluding entry/exit overhead
mult3232: ld hl,0 ; 0 -> hl high order d3, de with d2 high order
exx ; save bc with ip
ld a,h ;
push af ; save d1 high order byte 3
ld a,l ;
push af ; save d1 high order byte 2
ld a,b ;
push af ; save d1 low order byte 1
ld a,c ;
push af ; save d1 low order byte 0
ld hl,0 ; 0 -> hl' low order d3
ld c,4 ; 4 -> c outer loop counter
1$: pop af ; loop, [sp++] -> a next d1 byte
ld b,8 ; 8 -> b inner loop counter
2$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,3$ ; 7 ; if cf = 1 then
add hl,de ; 11 ; hl' + de' -> hl add low order
exx ; 4 ;
adc hl,de ; 15 ; hl + de + cf -> hl add high order
exx ; 4 ;
3$: sla e ; 8 ;
rl d ; 8 ; de' << 1 -> de' shift low order
exx ; 4 ;
rl e ; 8 ;
rl d ; 8 ; de << 1 + cf -> de shift high order
exx ; 4 ;
djnz 2$ ; 13(98); until --b = 0
dec c ;
jr nz,1$ ; until --c = 0
ret ; done
The same tricks as the 16x16->16 multiplication method can be used to reduce the cycle time, but at a cost of increased code size. We also assign different registers to the low and high order parts:
Entry:
- BC': low order signed or unsigned multiplicand d1
- DE: high order signed or unsigned multiplicand d1
- HL': low order signed or unsigned multiplier d2
- DE': high order signed or unsigned multiplier d2
Exit:
- HL: low order signed product or unsigned product d3
- HL': high order signed product or unsigned product d3
Perfomance: max 8 x (98+87+59+33) = 2216 cycles, excluding entry/exit overhead
mult3232: ld hl,0 ; 0 -> hl low order d3, de with d2 low order
exx ; save bc with ip
ld a,h ;
push af ; save d1 high order byte 3
ld a,l ;
push af ; save d1 high order byte 2
ld a,b ;
push af ; save d1 low order byte 1
ld a,c ; d1 -> a low order byte 0
ld hl,0 ; 0 -> hl' high order d3
ld b,8 ; 8 -> b loop counter
1$: rra ; 4 ; loop, a >> 1 -> a set cf
jr nc,2$ ; 7 ; if cf = 1 then
exx ; 4 ;
add hl,de ; 11 ; hl + de -> hl add low order
exx ; 4 ;
adc hl,de ; 15 ; hl' + de' + cf -> hl add high order
2$: exx ; 4 ;
sla e ; 8 ;
rl d ; 8 ; de << 1 -> de shift low order
exx ; 4 ;
rl e ; 8 ;
rl d ; 8 ; de' << 1 + cf -> de' shift high order
djnz 1$ ; 13(98); until --b = 0
pop af ;
ld c,a ; d1 -> c low order byte1
exx ;
ld a,h ; h -> a low order d3
exx ;
ld b,8 ; 8 -> b loop counter
3$: rr c ; 8 ; loop, c >> 1 -> c set cf
jr nc,4$ ; 7 ; if cf = 1 then
exx ; 4 ;
add d ; 4 ; a + d -> a add low order
exx ; 4 ;
adc hl,de ; 15 ; hl' + de' + cf -> hl add high order
4$: exx ; 4 ;
sla d ; 8 ; d << 1 -> d shift low order
exx ; 4 ;
rl e ; 8 ;
rl d ; 8 ; de' << 1 + cf -> de' shift high order
djnz 3$ ; 13(87); until --b = 0
exx ;
ld h,a ; a -> h low order d3
exx ;
pop af ; d1 -> a high order byte 2
ld b,8 ; 8 -> b loop counter
5$: rra ; 8 ; loop, c >> 1 -> c set cf
jr nc,6$ ; 7 ; if cf = 1 then
add hl,de ; 15 ; hl' + de' + cf -> hl add high order
6$: rl e ; 8 ;
rl d ; 8 ; de' << 1 + cf -> de' shift high order
djnz 5$ ; 13(59); until --b = 0
pop af ;
ld c,a ; d1 -> c high order byte 3
ld a,h ; h -> a high order
jr 9$ ; jump to shift c and loop
7$: add d ; 4 ; loop, a + d -> a
8$: sla d ; 8 ; d << 1 -> d
9$: srl c ; 8 ; c >> 1 -> c set cf and z if no bits left
jr c,7$ ; 12/7(32); until cf = 0 repeat with addition
jp nz,8$ ; 10(33); until c = 0 repeat without addition
ld h,a ; a -> h high order
exx ;
ret ; done
Note: unrolling the inner loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.
Fast unsigned 32/16->16 bit division and remainder
This implementation is used by all division and remainder (modulo) Forth words
by calling UM/MOD
. As such, it is an important and versatile algorithm.
Entry:
- HL: high order dividend ud
- BC: low order dividend ud
- DE: divisor u1
Exit:
- HL: remainder u2
- BC: quotient u3
Performance: max 85 cycles x 16 iterations = 1360 cycles, excluding entry/exit overhead
udiv3216: xor a ;
sub e ;
ld e,a ;
sbc a ;
sub d ;
ld d,a ; -de -> de with -u1
ld a,b ; b -> a low order dividend in ac
ld b,16 ; 16 -> b loop counter
sla c ;
rla ; ac << 1 -> ac
1$: adc hl,hl ; 15 ; loop, hl << 1 + cf -> hl
jr nc,2$ ; 12/ 7 ; if cf = 1 then
add hl,de ; 11 ; hl + -u1 -> hl
scf ; 4 ; 1 -> cf
jr 3$ ; 12 ; else
2$: add hl,de ; 11 ; hl + -u1 -> hl
jr c,3$ ; 12/ 7 ; if cf = 0 then
sbc hl,de ; 15 ; hl - -u1 -> hl to undo, no carry
3$: rl c ; 8 ;
rla ; 4 ; ac << 1 + cf -> ac
djnz 1$ ; 13(85); until --b = 0
ld b,a ; a -> b quotient bc, remainder in hl
ret ; done
The algorithm negates the divisor first to speed up subtraction by adding the negative of the divisor instead. Another benefit of using the negated divisor is that this produces the right carry value to shift into the quotient, otherwise the carry should be inverted or the resulting quotient must be inverted.
By moving the first conditional block out of the loop, we can save 5 CPU cycles on the critical path (the most expensive path through the loop) to reduce to max 80 cycles per iteration at the cost of making the code more cluttered.
Entry:
- HL: high order dividend ud
- BC: low order dividend ud
- DE: divisor u1
Exit:
- HL: remainder u2
- BC: quotient u3
Performance: max 80 cycles x 16 iterations = 1280 cycles, excluding entry/exit overhead
udiv3216: xor a ;
sub e ;
ld e,a ;
sbc a ;
sub d ;
ld d,a ; -de -> de with -u1
ld a,b ; b -> a low order dividend in ac
ld b,16 ; 16 -> b loop counter
sla c ;
rla ; ac << 1 -> ac
1$: adc hl,hl ; 15 ; loop, hl << 1 + cf -> hl
jr c,3$ ; 7/12 ; if cf = 1 then hl + -u1 -> hl, 1 -> cf else
add hl,de ; 11 ; hl + -u1 -> hl
jr c,2$ ; 12/ 7 ; if cf = 0 then
sbc hl,de ; 15 ; hl - -u1 -> hl to undo, no carry
2$: rl c ; 8 ;
rla ; 4 ; ac << 1 + cf -> ac
djnz 1$ ; 13(80); until --b = 0
ld b,a ; a -> b quotient bc, remainder in hl
ret ; done
3$: add hl,de ; 11 ; hl + -u1 -> hl
scf ; 4 ; 1 -> cf
jr 2$ ; 12 ;
By comparison, the CamelForth Z80 code is also fast, but slower than my implemenation with 90 cycles x 16 iterations = 1440 cycles, excluding entry/exit overhead:
udiv3216: ld a,16 ; 16 -> a loop counter
sla e ;
rl d ; de << 1 -> de
1$: adc hl,hl ; 15 ; loop, hl << 1 + cf -> hl
jr nc,2$ ; 12/ 7 ; if cf = 1 then
or a ; 4 ; 0 -> cf
sbc hl,bc ; 15 ; hl - u1 -> hl
or a ; 4 ; 0 -> cf
jp 3$ ; 10 ; else
2$: sbc hl,bc ; 15 ; hl - u1 -> hl
jr nc,3$ ; 12/ 7 ; if cf = 1 then
add hl,bc ; 11 ; hl + u1 -> hl to undo sbc, sets cf
3$: rl e ; 8 ;
rl d ; 8 ; de << 1 + cf -> de with inverse cf we'll need
dec a ; 4 ;
jp nz,1$ ; 10(90); until --a = 0
ld a,e ;
cpl ;
ld e,a ;
ld a,d ;
cpl ;
ld d,a ; complement de, faster than ccf in loop
ret ; done
Note: unrolling the loop would improve the speed at the cost of a significant code size increase, which is undesirable for small memory devices.
Fast unsigned 32/32->32 bit division and remainder
This algorithm uses the shadow registers BC', DE' and HL'. Because of this register pressure, there is little room for further optimization. Registers IX and IY cannot be used since they lack the necessary adc and sbc instructions.
Entry:
- BC: high order dividend ud1
- BC': low order dividend ud1
- DE': high order divisor ud2
- DE: low order divisor ud2
Exit:
- HL': high order remainder ud3
- HL: low order remainder ud3
- BC: high order quotient ud4
- BC': low order quotient ud4
Performance: max 162 cycles x 32 iterations = 5184 cycles, excluding entry/exit overhead
udiv3232: exx ;
xor a ;
ld h,a ;
ld l,a ; 0 -> hl'
rl c ;
rl b ;
exx ;
ld h,a ;
ld l,a ; 0 -> hl
ld a,b ; b -> a
rl c ;
rla ; ac << 1 -> ac
ld b,32 ; 32 -> b loop counter
1$: adc hl,hl ; 15 ;
exx ; 4 ;
adc hl,hl ; 15 ;
exx ; 4 ; hl'.hl << 1 + cf -> hl'.hl no carry
sbc hl,de ; 15 ;
exx ; 4 ;
sbc hl,de ; 15 ; hl'.hl - de'.de -> hl'.hl
jr nc,2$ ; 12/ 7 ; if cf = 1 then
exx ; 4 ;
add hl,de ; 11 ;
exx ; 4 ;
adc hl,de ; 15 ; hl'.hl + de'.de -> hl'.hl to undo, sets carry
2$: ccf ; 4 ; complement cf
rl c ; 8 ;
rl b ; 8 ;
exx ; 4 ;
rl c ; 8 ;
rla ; 4 ; ac.bc' << 1 + cf
djnz 1$ ; 13(162); until --b = 0
ld b,a ;
ld e,c ;
ret ;
Z80 floating point math routines
I've written a collection of Z80 IEEE 754 single precision floating point math routines:
- math.asm (960 bytes of code) a simple version with truncation
- mathr.asm (1085 bytes of code) includes three IEEE 754 rounding modes, where the default rounding mode is to round to nearest, ties to even;
- mathri.asm (1296 bytes of code) includes the three IEEE 754 rounding modes, and inf/nan and signed zero. This version is not intended for Forth850, because Forth850 raises floating point exceptions.
My objective was to make the floating point routines as efficient as possible, such as by using the shadow registers instead of memory. No memory is used, except at most one push-pop pair to move a value between the (shadow) registers. The second objective was to keep the code size small by using tricks with CPU arithmetic and flags. The floating point library is about 1KB.
Single precision floating point values are stored in registers BC (high order)
and DE (low order) to form a 32 bit float bcde
and shadow float bcde'
.
-
fadd
: floatbcde
+bcde'
->bcde
; cf set on overflow -
fsubx
: floatbcde
-bcde'
->bcde
; cf set on overflow -
fsuby
: floatbcde'
-bcde
->bcde
; cf set on overflow -
fneg
: float -bcde
->bcde
; no errors (cf reset) -
fabs
: float |bcde
| ->bcde
; no errors (cf reset) -
fmul
: floatbcde
*bcde'
->bcde
; cf set on overflow -
fdivx
: floatbcde
/bcde'
->bcde
; cf set on overflow or when dividing by zero -
fdivy
: floatbcde'
/bcde
->bcde
; cf set on overflow or when dividing by zero -
ftoi
: floatbcde
-> signed 32 bit integerbcde
truncated towards zero; cf set when out of range -
itof
: signed 32 bit integerbcde
-> floatbcde
; no errors (cf reset) -
ftrunc
: float trunc(bcde
) ->bcde
; no errors (cf reset) -
ffloor
: float floor(bcde
) ->bcde
; cf set on overflow -
fround
: float round(bcde
) ->bcde
; cf set on overflow -
fpow10
: 10^a
*bcde
->bcde
for -128 <=a
< 39; cf set on overflow -
atof
: string [hl
..hl
+a
-1] -> floatbcde
; cf set on parsing error andhl
points after the char -
ftoa
: floatbcde
-> [hl
...hl
+a
-1] string of digits, exponente
and signd
bit 7; no errors (flags undefined) -
fzero
: setbcde
to 0.0
mathri.asm includes inf/nan and signed zero. In this version the routines listed above may return signed inf or nan with cf set to indicate overflow and errors. In addition, this version includes the following routines:
-
ftype
: floatbcde
->bcde
unchanged; cf set ifbcde
is nan, cf reset and z set ifbcde
is +/-inf -
fnan
: setbcde
to nan; cf set -
finf
: setbcde
to inf with sign in register A bit 7 (negative when set); cf set
Z80 string routines
I've written the following Z80 string routines. My objective was to make them
as efficient as possible, such as by making the obvious choice to use the cpi
and cpir
Z80 instructions to minimize cycle count. The second objective was
to keep the code size small by using tricks with CPU arithmetic and flags.
Fast string comparison
Entry:
- IX: address of the first string c-addr1
- HL: size of the first string u1
- DE: address of the second string c-addr2
- BC: size of the second string u2
Exit:
- A: -1 (less), 0 (equal), 1 (greater)
- F: zero flag set when equal, sign flag set when less
Performance: 46 cycles per character comparison when characters match
compare: push ix ; save c-addr1
push hl ; save u1
xor a ; 0 -> a flags u1 = u2, 0 -> cf
sbc hl,bc ;
jr z,1$ ; if u1 <> u2 then
inc a ; 1 -> a flags u1 > u2
jr nc,1$ ; if u1 < u2 then
pop bc ; pop u1 -> bc
push bc ; rebalance stack
ld a,-1 ; -1 -> a flags u1 < u2
1$: pop hl ; pop to discard u1
pop hl ; pop c-addr1 -> hl
ex af,af' ; save a with -1|0|1 flag
ld a,c ;
or b ;
jr z,3$ ; if bc <> 0 then
; compare chars
2$: ld a,(de) ; 7 ; loop
cpi ; 16 ; compare [hl++] to [de], --bc
jr nz,4$ ; 7 ; while characters [hl] and [de] are equal
inc de ; 6 ; de++
jp pe,2$ ; 10(46); until bc = 0
; chars match, check lengths
3$: ex af,af' ; restore a with -1|0|1 flag
ret ;
; strings differ
4$: dec hl ; hl-- to correct cpi overshoot
cp (hl) ; test a<[hl]
ccf ; complement cf, cf = 1 if [hl]<a
sbc a ; a = -1 if cf = 1 else 0
add a ; a = -2 if cf = 1 else 0
inc a ; a = -1 if cf = 1 else 1
ret ; done
Fast string search
Naive string search, i.e. not Knuth-Morris-Pratt which is faster but would require a table and more code.
Entry:
- HL: address of the string searched c-addr1
- IX: size of the string searched u1
- DE: address of the string to search c-addr2
- BC: size of the string to search u2
Exit:
- F: carry set when no match found
- HL: address of the string position found c-addr3
- BC: size of the remaining characters after the match
Performance: 21 cycles per character to search the first or next character match and 46 cycles per character comparison when characters match
search: or a ; 0 -> cf
sbc ix,bc ; u1 - u2 -> ix
ret c ; if u2>u1 then impossible search, cf = 1
ld a,c ;
or b ;
ret z ; if u2 = 0 then done (found), cf = 0
push ix ;
push bc ;
pop ix ; u2 -> ix
pop bc ; u1 - u2 -> bc
inc bc ; u1 - u2 + 1 -> bc correct for cpir
push hl ; save c-addr1 on the stack
; find char match
1$: push de ; loop, save de with c-addr2
ld a,(de) ; [de] -> a
cpir ; 21/16 ; repeat until a = [hl++] or --bc = 0
jr nz,4$ ; if no match then not found
pop de ; restore de with c-addr2
push bc ;
push de ;
push hl ; save bc,de,hl
push ix ;
pop bc ; u2 -> bc
; compare substrings
dec bc ; u2 - 1 -> bc since u2 > 0
ld a,c ;
or b ;
jr z,3$ ; if bc<> 0 then
inc de ; de++ to start matching at c-addr2+1
2$: ld a,(de) ; 7 ; loop
cpi ; 16 ; compare [hl++] to [de], --bc
jr nz,3$ ; 7 ; while characters [hl] and [de] are equal
inc de ; 6 ; de++
jp pe,2$ ; 10(46); until bc = 0
3$: pop hl ;
pop de ;
pop bc ; restore bc,de,hl
jr nz,1$ ; repeat
; substrings match
dec hl ; hl-- to correct cpir overshoot
ret ; done, cf = 0
; not found
4$: scf ; 1 -> cf
ret ; done, cf = 1
Sharp PC-G850(V)(S)
- Manual: http://basic.hopto.org/basic/manual/Sharp%20PC-G850V.pdf
- HP Forum thread: https://www.hpmuseum.org/forum/thread-10520.html
- Sharp PC-G850(V)(S) software (Japanese site): http://ver0.sakura.ne.jp/g800/index.html
Forth resources
- Forth for the Sharp PC-E500(S): https://github.com/Robert-van-Engelen/Forth500
- Forth 2012 Standard: https://forth-standard.org/standard/intro
- Moving Forth: https://www.bradrodriguez.com/papers/moving1.htm
Forth850 benefits from the work done by many others to offer inspiration, but the system does not include licensed code of the following implementations or any other implementation not listed here. Some parts of Forth850 are derived from freely available Forth resources listed above and the Z80 resources listed further below:
- CamelForth for the Z80: http://www.camelforth.com/page.php?5
- eForth: https://github.com/lispnik/eforth/blob/master/z80efort/EFZ80.ASM
- Jupiter Ace ROM listing: http://www.jupiter-ace.co.uk/romlisting.html#L085F
Z80 resources
- asz80 assembler and linker: https://shop-pdp.net/ashtml/asz80.htm download https://shop-pdp.net/ashtml/asxget.php
- Zilog Z80 CPU User Manual: https://www.zilog.com/docs/z80/um0080.pdf
- Z80 Instruction Set: https://wikiti.brandonw.net/index.php?title=Z80_Instruction_Set
- Z80 Instruction Set: https://www.smspower.org/Development/InstructionSet
Z80 maths
- Z80 the 8-bit number cruncher: http://www.andreadrian.de/oldcpu/Z80_number_cruncher.html
- Z80 integer math routines: http://map.grauw.nl/sources/external/z80bits.html
- Z80 bits (integer math routines): https://wikiti.brandonw.net/index.php?title=Category:Z80_Routines:Math
- Z80 advanced math: http://z80-heaven.wikidot.com/advanced-math
- Z80 classic maths libraries: https://github.com/z88dk/z88dk/wiki/Classic--Maths-Libraries
- Z80 IEEE754 floating point library: https://github.com/Zeda/z80float