discussion Which processors do your applications target?

I'm curious on which processors people run their Forth applications?

Jun 27 '17 07:06 larsbrinkhoff

65c02 and 65816, on a home-made workbench computer I use for controlling experiments and processes on the workbench, taking data, trying new ICs we would put into products, and programming PIC microcontrollers for several products. Forth has served the purpose very nicely, in highly interactive development.

Jun 27 '17 07:06 GarthWilson

68HC11 68HC12 (S12, etc) 68HCS08 68000 (all variants) 69R000 6303 6809 8051 (all variants) ARM AVR ColdFire H8/H8H MCore MSP430 PSC1000 RTX2010 TMS320

Best regards,

Leon Wagner FORTH, Inc.

On Jun 27, 2017, at 00:02, Lars Brinkhoff [email protected] wrote:

I'm curious on which processors people run their Forth applications?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ForthHub/discussion/issues/49, or mute the thread https://github.com/notifications/unsubscribe-auth/AON9eX5wtHPDEEsWAKM-M2KojV7LfalSks5sIKkfgaJpZM4OGOJ-.

Jun 27 '17 13:06 leonwagner

x86

so exotic i know

Jun 27 '17 14:06 RogerLevy

A homebrew 16 bits stack machine made out of TTL chips, running at .1 MHz or thereabouts. Just fast enough to drive an I/O pin at 180 baud and blink a LED.

I cheated on the RAM, though. I didn't feel like wiring up hundreds of 74xxx chips so I used an off-the-shelf chip. :-)

I've been thinking about building (and writing a Forth for) a transport-triggered architecture, although I have no idea yet what it would look like. Anyone have experience with those?

Jun 27 '17 18:06 bnoordhuis

On Jun 27, 2017 2:58 PM, "Ben Noordhuis" [email protected] wrote:

A homebrew 16 bits stack machine made out of TTL chips, running at .1 MHz or thereabouts. Just fast enough to drive an I/O pin at 180 baud and blink a LED.

Mount a crank on the case. Thatll make it feel faster.

I cheated on the RAM, though. I didn't feel like wiring up hundreds of 74xxx chips so I used an off-the-shelf chip. :-)

Wimp ;)

I've been thinking about building (and writing a Forth for) a transport-triggered architecture, although I have no idea yet what it would look like. Anyone have experience with those?

—

Duh? Lost me at "transport triggered".

brett

Jun 27 '17 20:06 beretta42

6809, mostly. I should be working on making my forth cross compiler re-targetable... then I'd do up z80, x86, ARM targets. I ported (aka did almost nothing) my cross compiler (written in plain C ) to fuzix. I would love to wire up my forth in kernel space somehow, maybe inside a /dev/tty driver. Someday...

Jun 27 '17 20:06 beretta42

Lost me at "transport triggered".

From https://en.wikipedia.org/wiki/Transport_triggered_architecture:

[...] a kind of CPU design in which programs directly control the internal transport buses of a processor. Computation happens as a side effect of data transports: writing data into a triggering port of a functional unit triggers the functional unit to start a computation. [...] Transport triggering exposes some microarchitectural details that are normally hidden from programmers.

Which seems very much in tune with Forth. I got the idea after seeing this (won't claim I'm original) and running into someone who did his undergrad work on the MOVE32, the original TTA chip.

(Apologies for hijacking the thread. Tell me if it's derailing and I'll shut up.)

Jun 27 '17 20:06 bnoordhuis

@bnoordhuis I have experience with TTA. Here some paper I have published long ago: http://www.complang.tuwien.ac.at/anton/euroforth/ef99/chapyzhenka99.pdf

Jun 29 '17 19:06 drom

I'm using a variation of James Bowman's excellent J1a processor on a Lattice ice40 8k FPGA. ( jamesbowman/swapforth here on GitHub ).

The j1a is a nice simple 16-bit forth machine with hardware stacks which allows for top and next access and replacement within a single clock cycle, at 48 MHz. (And I've had one overclocked to 69 MHz too). Jame's brilliant ISA also means that there's risc-like instruction field packing, which among other things means that an ALU op can also always optionally cause a return stack pop to the PC to implement a 'free' return instruction, and since a call also has space for the full address to jump to (courtesy of the tiny address space mostly, I suppose) a word need only require about two instructions to be worthwhile factorising for space. Everything runs in one cycle, apart from ram memory loads and io reads, which take two. (And immediate loads where the msb is set, which require a second instruction to invert the loaded word, due to literal word encoding accomodating only 15 bits.). The core is only about two pages of verilog, mostly fairly clearly laid out in 'truth table' like case-statement format.

Its main limitations are that it has finite stack depths (although easily adjusted at fpga compilation time, which btw only takes a couple minutes for the whole FPGA image ) and quite limited local RAM. ( 4K x 16 bit words ).

I extended this for the 8k series chip, and duplicated the stacks to make it a 'C-slowed' multicore processor, which I called the j4a. The 4 because C = 4 here. ( I haven't retimed it yet, so no throughput increase over the j1a, although apparently there's a US patent in effect for that combination, which is a bit rude, since 'fine-threaded' multithreading in pipelined logic is a very old idea, used on some Crays, the Alpha processor, and I think the basis for AMD's GCN GPU micro architecture too, reading between the lines on how it's able to achieve near 100 % ALU utilisation, which a pipelined architecture with stalls can't do, but I digress). I then added a 'hardware peripheral' which allows each core to be assigned an xt.

This results in a one-quarter as fast design which is more or less 'software compatible' with Jame's j1a, at which point I load swapforth on it, and run as per normal for ordinary development.

In practise this lets me code up very simple code to talk to for example a pmod with an isolated 4..20mA output on it, and get the timing right by just using a wait loop, with the count in a variable.

This is then 'left running' on one of the three "background" cores, which, because of the architecture, all run exactly consistently - interleaving ram and peripheral io accesses, and all using the same ALU at 1/4 the speed of the j1a.

I then have a couple of other 'cores' free, say one for data output, and one for sequencing slower control logic, such that the j4a can be doing three jobs perfectly without interruption or timing variations, and without any software tasking mechanism to go wonky when something happens to take too long unexpectedly.

The 'zeroth' core is always free to use exactly as a j1a, so one can peek and poke at variables to see how things are going, or reset individual cores if they get stuck.

No loops are used, each of the other three cores normally hits a branch I put in swapforth that leads it to read and execute its assigned xt repeatedly, until it's cleared. (And which does nothing on a j1a, lacking the peripheral which responds to it. )

The only real gotcha is that swapforth's loop implementation uses a variable, which would get clobbered if more than one core tried to use it at the same time. For this reason I mostly just use begin ... until loops instead.

Also, for some reason I have yet to figure out, the number interpreter running on the zeroth core tends to mis-interpret numbers if the other three cores are busy. It never misinterprets words or other text though. I'm still trying to nut that one out, but it's not a showstopper, just something that needs to be worked around, usually by repeating something like drop 3000 dup . until 3000 ok comes back instead of garbage.

I'm thinking of extending the architecture somewhat to put the cores' stacks on an external fast SRAM chip, which would allow many more concurrent 'cores' without any inconsistent overheads. (But keeping the tiny but sufficient sram space, which all cores get reliable hardware-regulated access to. )

More cores would also allow more cycles of latency to allow for eg masking and intercepting (and duplicating privately, per core) certain memory addresses which could be picked out of the initial swapforth ram image at FPGA compile (or recompile) time to fix issues like the one for loop.

I'm also very tempted at the same time to use a similar 'invisible' ram-shadowing mechanism to allow for certain defined variables to be used as process inputs/outputs via the FPGA logic to an entirely separate system via an SPI link or so to allow that other separate system to run the IoT GUI crap in such a way as its insecurity can't mess with the system's reliability. Maybe an esp32 module hosting some JavaScript for the web UI on a tablet or mobile phone. Cheap/easy way to add HMI for an operator -- assuming any of my gear eventually needs such. Done that way any schenangans at that end of things can't possibly jostle the shoulder of the system doing the actual work. And security attacks which can't reach the (physically unplugged when deployed) serial link to the j1a's zeroth core can do no harm either.

Oh, btw, the initial sram contents are helpfully loaded from the FPGA image by the same automatic mechanism which reads that image out of an eeprom chip, which takes a few ms after powerup. And the icestorm tools can possibly reload those contents without actually requiring a full fpga rebuild too, although swapforth isn't set up for that yet, so there's no actual need to bother with any boot code or even to program the system to access its own eeprom chip.

Although that hasn't stopped someone from doing so! Shout out to zuloloxi/mecrisp-ice -- which is a separate j1a fork which does do that, as well as extending the j1a to the full 16KB block ram available on the ice40hx8k chip.

Anyway, despite its flaws I've had good success using the j4a to control lab equipment in a very reliable (and extremely flexible!) way with very little actual program code.

I added a couple of SPI master 'peripherals' to the FPGA design in that case so I needn't 'bit-bang', one of which runs happily at 100 MHz.

Because it's on an FPGA, it's straightforward to add a little verilog to the design to do simple/fast things like that.

I have another variation with different peripherals which includes an encoder position tracker good for 32 bits, and which has a separate additional rs232 interface to talk to the servodrive which drives the axis to which the encoder is attached.

The combination of a minimal ANS Forth system in one corner of an FPGA which can be recompiled from scratch in a few minutes with completely open-source tools ( cliffordwolf/icestorm ) is a game-changer for lab use. One can even run the whole toolchain on a linux SBC, although then you're probably talking maybe 20-40 minutes for FPGA compilation. But it at least keeps all the 'batteries included', which helps a lot come repair-time in the field.

Because it's hooked up that way, I can secure-shell remotely into the SBC on the system, drop into the screen session with the terminal to the j4a and issue commands / reprogram the swapforth system even as it runs without the slightest jitter my oscilloscope can see.

My usage of forth is consequently very simple - lots of m*/, no does> . I keep my code in a git repo on the SBC, edit with vim, and just squirt part or all of the code onto a 'blank' swapforth system (which already has most of ANS Forth core, identical to what you get on a 'bootstrapped' j1a ).

I'd been following /reading about Jame's J1 design previously, but never quite figured out how to get it to work myself. When he released swapforth with the j1a comping for an iCEstick, I was blown away by how trivial it was to get up and running. I've been able to do so on both linux and macOSX computers, and FPGA development, whilst possible on the former, was never so easy to set up for. (On the latter, it wasn't previously even possible).

But yeah, my experiance with forth has been all in the last year or so, and I'm very happy with the sheer timeless simplicity of it, especially compared to the alternatives.

Jul 08 '17 11:07 RGD2

RGD,

I'm pleased to hear that you are using James Bowman's J1a and your J4a on ICE40 FPGA hardware.

We have produced a low cost ICE40 4K dev-board called "myStorm". I'm looking forward to using the J1 and swapforth on the new myStorm board.

A little known fact is that the 4K part is actually an 8K die, which is artificially hobbled by Lattice's programming application. Using Clifford Wolf's toolchain, it just appears like any 8K part.

The myStorm boards have some extra features - not found on the other dev-boards

256Kx16 10nS SRAM - close coupled to FPGA
STM32L433 ARM Cortex M4 mcu - used for programming the FPGA and also offering ADC, DAC, SPI, UART and other peripherals.
60 FPGA signals brought out to PMOD connectors
Arduino style headers allowing compatibility with Arduino shields and accessories using "STM32Duino"
Mecrisp Forth ported to the STM32
$50 price tag for 1 off - with new stock expected in late July

This describes the version 1 board - which is very similar

https://www.fpgarelated.com/thread/799/mystorm-a-30-ice40-arm-m3-dev-board

More can be found here https://mystorm.uk/we-forecast-blackice-this-winter-2/ and on the myStorm Forum

Ken (London)

Jul 08 '17 15:07 monsonite

Ken Boak writes:

The myStorm boards have some extra features - not found on the other dev-boards

256Kx16 10nS SRAM - close coupled to FPGA [...]

Olimex also ships boards with 512kB fast SRAM suitable for forth CPU experiments. I have no affiliation with them, except that I did buy a couple of their HX1K and HX8K boards. The HX8K was quite delayed after the announcement, but it is finally shipping.

Details: https://olimex.wordpress.com/2017/06/22/ice40hx8k-evb-oshw-fpga-board-is-in-stock/

Only drawback is that they don't have a programmer onboard, so you have to improvise (E.g. https://github.com/anse1/olimex-ice40-notes )

Jul 08 '17 16:07 anse1

Ah ha!

I wasn't aware of either of those, thanks guys!

I know Dave of Xess.com has been working on a Pi hat form with an 8K and SDRAM he's calling the catboard, and I have a Icoboard floating around somewhere, but I've mainly been using the hx8k breakout boards as easily-available (and easily replaceable, although I haven't needed to yet) modules for use at work.

The main driver for me to push the j4a's stacks out of logic into an external SRAM was to try and squeeze it into a 1k chip, mostly because I've been eyeing off the olimex board, but an 8k version makes that easy... although now the mystorm is tempting too. The smaller sram isn't much of a limitation for this application, since one doesn't often really need a very need stack, and the presence of a STM32 with mecrisp seems quite useful... I'll have to think about it, but for the moment I'm still leaning in favour of the olimex board with its nice IO add on cards, particularly because of the 100 MHz ADC option. (I've found the slowest rate really useable to be about 1 MHz, with a bessel 2-pole low pass in front of it set to roll off to about -60dB by 500kHz, it works well for pressure sensors which often have quite 'blue' noise spectra. I can mod pmod AD1's to do this, although it's a bit of a pain. There is a specific application where a 10 MHz sample rate is the minimum that works, so I would need a faster ADC for that.

Jul 09 '17 00:07 RGD2

The only real gotcha is that swapforth's loop implementation uses a variable, which would get clobbered if more than one core tried to use it at the same time. For this reason I mostly just use |begin ... until| loops instead.

I changed the loop implementation in Mecrisp-Ice, there both index and limit are on the return stack.

I'm thinking of extending the architecture somewhat to put the cores' stacks on an external fast SRAM chip, which would allow many more concurrent 'cores' without any inconsistent overheads. (But keeping the tiny but sufficient sram space, which all cores get reliable hardware-regulated access to. )

I think it would be interesting to change the core in a way to keep TOR, TOS and NOS in registers, and storing the stacks in the regular memory map. This would of course degrade performance a lot, but will allow for large stack depths even with the HX1K block ram in place only.

At the moment, two stacks of depth 16 seem to be about the maximum possible in HX1K in terms of logic gates and routing available.

Although that hasn't stopped someone from doing so! Shout out to zuloloxi/mecrisp-ice -- which is a separate j1a fork which does do that, as well as extending the j1a to the full 16KB block ram available on the ice40hx8k chip.

:-) Thank you !

Mecrisp-Ice also features constant folding and automatic inlining.

I'd been following /reading about Jame's J1 design previously, but never quite figured out how to get it to work myself. When he released swapforth with the j1a comping for an iCEstick, I was blown away by how trivial it was to get up and running. I've been able to do so on both linux and macOSX computers, and FPGA development, whilst possible on the former, was never so easy to set up for. (On the latter, it wasn't previously even possible).

Oh yes, many thanks fo Clifford Wolf and its team for Yosys, Arachne-PNR and Icestorm... Hats off !

And to James Bowman for its beautiful processor design.

Best wishes from Germany, Matthias

Jul 10 '17 13:07 Mecrisp

I used AVRs a long time, but was never too happy with their Harvard architecture.

Since Michael Kalus introduced me to the MSP430 I'm loving the chip. If the task is to much for it, I use ARMs (Tive Launchpads mosty). My primary Forth for these targets is Matthias' Mecrisp ;)

Jul 19 '17 11:07 GeraldWodni

Cortex-M3/M4 and M0. I use MPE forth.

Jul 22 '17 04:07 rbsexton

My Forth runs on an emulated virtual machine with a dual stack architecture and MISC instruction set.

Jul 22 '17 11:07 crcx

ARM, PPC/Power,x86, MOS-6502, m68k, and AMD64

Aug 25 '17 22:08 fatman2021

I'm curious, @fatman2021. Are those PPC, 6502, and 68000 devices used in current designs?

Aug 26 '17 04:08 larsbrinkhoff

My applications target the STM8 in general, and specifically low-cost STM8S003F3 uCs used in different cheap off-the-shelf boards.

The STM8 has IAP features similar to the MSP430, the 8bit performance is comparable to AVR, and there is limited support for 16 bit operations.

I use (and maintain) an improved and extended version of Dr. C.H. Ting's STM8 eForth.

Aug 26 '17 20:08 TG9541

My applications are all running on Cortex M0/M3/M4(F) with mixing C and Forth now.

Aug 28 '17 00:08 forthchina

Add a few NXP parts and I too raget thesame chips.

On Sun, Aug 27, 2017 at 7:24 PM, forthchina [email protected] wrote:

My applications are all running on Cortex M0/M3/M4(F) with mixing C and Forth now.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ForthHub/discussion/issues/49#issuecomment-325234566, or mute the thread https://github.com/notifications/unsubscribe-auth/AFC6xQ9bfJV9OsWfkay7gpMtd-W7lzalks5scgizgaJpZM4OGOJ- .

Aug 28 '17 01:08 cwpjr

@bnoordhuis @beretta42 @drom

MAXQ is a transport-triggered architecture which is a real procuct from Maxim Integrated:
https://github.com/larsbrinkhoff/awesome-cpus/tree/master/MAXQ

Sep 14 '17 06:09 larsbrinkhoff

Lars & All,

I found the data sheets and user guide a little confusing - then I found details including full breakdown of the instruction set here - beginning on page 184.

http://pdfserv.maximintegrated.com/en/an/AN5136.pdf

This MAXQ20 core has been used in a number of Dallas/Maxim devices

On 14 September 2017 at 07:12, Lars Brinkhoff [email protected] wrote:

@bnoordhuis https://github.com/bnoordhuis @beretta42 https://github.com/beretta42 @drom https://github.com/drom

MAXQ is a transport-triggered architecture which is a real procuct from Maxim Integrated: https://github.com/larsbrinkhoff/awesome-cpus/tree/master/MAXQ

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ForthHub/discussion/issues/49#issuecomment-329383684, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuUP07T1dM0Pl-elD8qt2ZeUI2aaPXPks5siMO6gaJpZM4OGOJ- .

Sep 14 '17 08:09 monsonite

I'm using the STM32F746 cortex M7 With mecrisp-stellaris

Sep 15 '17 08:09 jjonethal

TMS9900 cross-complied with a Dos Forth. :-)

Purely for my own Amusement. Used Camel Forth hilevel code.

Sep 25 '17 02:09 bfox9900

Mine too https://www.mediafire.com/folder/6fqkfykcel80s/FISH_Forth

On Sun, Aug 27, 2017 at 7:24 PM, forthchina [email protected] wrote:

My applications are all running on Cortex M0/M3/M4(F) with mixing C and Forth now.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ForthHub/discussion/issues/49#issuecomment-325234566, or mute the thread https://github.com/notifications/unsubscribe-auth/AFC6xQ9bfJV9OsWfkay7gpMtd-W7lzalks5scgizgaJpZM4OGOJ- .

Oct 22 '17 18:10 cwpjr

I use GraForth for the Apple II, Jforth for the Amiga, and Gforth for everything else.

Nov 14 '17 22:11 fatman2021

discussion discussion copied to clipboard

Which processors do your applications target?

discussion
discussion copied to clipboard