discussion
discussion copied to clipboard
A peculiar and fun Forth like compiler targeting bash functions, recently made available.
As I like Forth to be fun too, apart from its practical aspects, I've been enjoying Forth written in lesser likely languages, from scripting languages to editor macros to mobile device "automation" apps.
Here's my recent concoction, named yoda which is now at the point of starting to even become somewhat useful, with the gravest flaws and problems now eliminated.
It's a Forth-like implementation not based on a virtual machine, but compiling to bash functions. These compiled functions execute in the same context as the compiler responsible for generating them, which means that the compiler won't compile to a file which is then loaded and executed, but to the environment wherein it is running itself, very much like you're familiar with from about any Forth interpreter.
It features reasonable code transparency, by providing the words to access and view the source compiled words have been compiled from, as well as the generated code. It also features built-in word description lookup capabilities, aiming to lower the threshold to fledgling Forth users.
But in fact I shouldn't call this a "Forth", as it's merely "Forth like", as I didn't bother about trying to adhere accurately to one or several of the available standards, which is partly due to the host platform (bash) making it somewhat hard. Example: double cell integers, no "real" memory access, code space separated from (virtual) memory space, a return stack which doesn't live up to its name. However, it's up there for inspecting, toying with it and hopefully enjoying it.
About half of the words it provides has by now been described ("documented", sort of)
Requirements: a computer running bash (that means, probably some Unixoid system), which has coreutils ~~and sed~~ installed. Additionally, a text editor would be nice too. (sed dependency has been dropped)
This is amusing to me because just yesterday I was thinking how funny it would be to implement Forth in BASH. There is no way I would have done it though.
Here's another Forth in bash, but this one is virtual machine based, and therefore a tad slow: bashforth This is older stuff, close to 20 years ago now that I enjoyed coding this. yoda is faster by a factor of about 30 to 50.
About 30 years ago I wrote Forth in PostScript in one page of code.
At a point I attempted to code a Forth in Brainfck :) I got to the point of working random memory access, then I gave up on the attempt. That was about the time when I then switched to a Perl implementation, going the lazy route :) That implementation is on my page of my projects on github too, btw, but the completely incomplete Brainfck version I eradicated, no trace left of it.
I was mentoring a high school student a few years ago and gave him a quick lesson in Forth. A couple of weeks later he came to me with an implementation in Haskell. He was having trouble with implementing data space but we figured out a way to do it by forking the workspace on every store operation. Hideously inefficient, but hey.
mentioning Haskell, I was wondering what the most unlikely language used to implement Forth in was, thinking of a handful of code golfing languages like Jelly, Vyxal, and also Husk which took some inspiration from Haskell. My personal favourite may be a Forth running on a set of pneumatic or hydraulic valves as host CPU.
This is great! Definitely one of the most unusual Forth implementations I've come across.
Here is another implementation challenge; Minecraft Redstones.
Cheers Niclas
On 2022-01-09 03:47, Cat Stevens wrote:
This is great! Definitely one of the most unusual Forth implementations I've come across.
-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. Triage notifications on the go with GitHub Mobile for iOS [3] or Android [4]. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Links:
[1] https://github.com/ForthHub/discussion/issues/110#issuecomment-1008217377 [2] https://github.com/notifications/unsubscribe-auth/AAA2BROKBXALVES4Y6DGWCLUVDZLFANCNFSM5LP5BV3A [3] https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 [4] https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub
I got the wiki running and a handful of pages populated, aiding with the description effort of everything related to yoda
@Bushmills, I have looked at the differences, and I wonder why not use just another name when a word behaves differently?
For example, you could rename your variants:
-
parse
toparse$
(since it places a string on the string stack) -
immediate
tocompile-only
(since your variant actually works as the latter one in many Forth systems)- and define
immediate
to copy a header to the compiler vocabulary
- and define
-
abort
to(throw)
-
?abort
to?throw
(probably, for compliance, it should not throw code0
by any means) - string patter
"ccc"
to"ccc"$
or$"ccc"
(to indicate that the string is placed on the string stack)- and make the string pattern
"ccc"
to place(c-addr u)
on the data stack only
- and make the string pattern
-
<# # #s #>
to<// // //s //>
or<⌘ ⌘ ⌘s ⌘>
(for pictured numeric output of single-cell numbers).
@ruv, thank you for your thoughts on standards compliance. This is valuable and very much appreciated input!
Renaming parse
to parse$
makes entirely sense, and I will follow that suggestion. The current naming is due to parsing preceding strings in yoda and me not reconsidering the name choice of parse
once strings were implemented.
The general answer to "why not use just another name" is that a system is either compliant or not compliant - an "almost" compliant system is still a non-compliant system, and given that I'm unlikely to manage "full" compliancy, I didn't bother much about eliminating single aspects of non-compliancy, as with the examples of same naming for different behaviour. With some, such as with throw
, implementation is still pending, which is why I wouldn't, even temporarily, want to use these words up now already. I rather change abort
once throw
exists, letting it take advantage if it, and consider abort for the time being a stop-gap measure.
About changing string pattern, I'm hesitant. Mostly due to wanting to avoid c-addr u type strings altogether, or as much as possible: dealing with that type of strings - each char stored as it's ASCII value in a single array entry - is rather inefficient in bash, so I want yoda to default to string stack for strings related operations, and only unpack vs pack them when it's inevitable. Using "these type of string"
just reads more naturally in this context, as I don't have plans to implement s"
for c-addr u type strings. I may - possibly - think of adding a ." string pattern for output, but as there's no storing of strings involved, there should exist no need to identify by naming what kind of string was output.
Your immediate
vs compile-only
suggestion I will consider. yoda used a precedence flag before, so it was easy to keep the immediate
word where it was used, while only changing it's implementation. I'm undecided with regards to this choice, and that's when I like to listen to opinions others may have.
Just another quick additon on the abort
issue: yoda error handling hasn't been finalized: for proper error handling I want real warm start capability - which I haven't at this point. Currently, errors are partly signalled back through nested words, in the hope of reaching from/evaluate or even quit eventually. This is messy and not to my liking, and this also affects the preliminary status of error/abort/throw and related words. Probably there can't be a definitive implementation before knowing the best way how do deal with the occasional need of warm starting the system.
I too am a big fan of changing the name if the detailed semantics change. Otherwise anyone else trying to read the code can become utterly confused. And if you do decide to try and implement the standard version, you can do so without having to rewrite old code that uses the variant. My mantra: Names Are Cheap
Valid points you have there, @MitchBradley. You guys make so much sense here.
I've followed your valuable suggestions re word naming, and made compiled
from immediate
. For consistency, former - misnomer - interactive
is now interpreted
.
Thank you!
I am glad that the suggestions made sense to you. It can be difficult to come up with good new names, but reusing an old name that has an established meaning usually causes problems going forward.
As my use of yoda is in parts also as experimentation vehicle, here's one of the observations gained from it, which may be worthy of consideration in other Forths and Forth-alikes: delayed headers creation. These do away with the commonly implemented hide/reveal construct, rendering it unnecessary and removing yet another header flag (if implementation of hide/reveal uses those). Therefore has delayed headers creation the potential to complement replacement of precedence header flag.
What I'm doing for delaying headers creation is to leave it to semicolon to create the headers of colon words. Should compilation fail, no header will be created at all. Should a header by the same name be referenced during compilation, as it happens when compiling the former version of a word into a redefinition of it, the new header isn't found because it hasn't been created yet. Reason for delaying headers in yoda is actually a different one, so the effects on avoiding self-referencing during compilation are more a side effect, a byproduct, but one of a kind I consider valuable enough to think of utilising such an approach for this goal alone already: get rid of hide/reveal.
On 07/02/2022 at 10:24 AM, "Bushmills" @.***> wrote:
As my use of yoda is in parts also as experimentation vehicle, here's one of the observations gained from it, which may be worthy of consideration in other Forths and Forth-alikes: delayed headers creation. These do away with the commonly implemented hide/reveal construct, rendering it unnecessary and removing yet another header flag (if implementation of hide/reveal uses those). Therefore has delayed headers creation the potential to complement replacement of precedence header flag.
What I'm doing for delaying headers creation is to leave it to semicolon to create the headers of colon words. Should compilation fail, no header will be created at all. Should a header by the same name be referenced during compilation, as it happens when compiling the former version of a word into a redefinition of it, the new header isn't found because it hasn't been created yet. Reason for delaying headers in yoda is actually a different one, so the effects on avoiding self-referencing during compilation are more a side effect, a byproduct, but one of a kind I consider valuable enough to think of utilising such an approach for this goal alone already: get rid of hide/reveal.
There was a fig-version called Split-Forth where the headers were kept very separate from the code. The idea was that the headers could be eliminated when the application was complete. Sounds like you are aiming at something similar.
Regards
Paul E. Bennett IEng MIET Systems Engineer Lunar Mission One Ambassador
Paul E. Bennett IEng MIET..... Forth based HIDECS Consultancy............. Mob: +44 (0)7811-639972 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk..
Hi Paul, No, not really. Separating heads is what I tend to do in about any of my Forths already anyway. With Forths written in such interpreted and scripting kind of languages, it comes almost automatically to do so, as one is inclined to use already existing data structures like arrays for headers, rather than unpacking names into usually virtualised memory. OTOH do assembly implementations, especially for smaller controllers, benefit by being able to remove all headers, freeing up the space they occupy. I liked vocabulary growing from the end of memory towards lower addresses with each new header added. Also in case of yoda are headers already separately stored, in an array. But delayed creation of headers is an independent scheme, albeit it probably depends on headers being separated (or place them at a funny location, like, behind the body of a word)
yoda is a spin-off of another experimentation platform, which was mostly used for testing ways to do constant expression folding, which is why words weren't compiled incrementally, but code - and pseudo-code - was buffered for post-processing, triggered by semicolon. yoda inherited this postprocessing, but not the CEF optimisation. With this already in place, delegation of header creation to this post-processing phase was then only a small step.
What I'm doing for delaying headers creation is to leave it to semicolon to create the headers of colon words.
I usually do it as well, using the following words:
relate-wordlist ( xt sd.name wid -- )
naming ( sd.name xt -- )
(where sd.name
is c-addr u
pair)
I want to find better names for these words (especially for latter one).
@ruv, what are your experiences with this approach of reversing the order of compilation and header creation? Any problem points you have encountered as far?
My biggest issue with this is currently this: every time a header is created, the file handle and line number of the file it was loaded from is recorded, for easily accessing the word source for editing or viewing. At this point the recorded source location of colon words is the line carrying the semicolon, not the colon. Not horribly complicated to fix, and more a cosmetic issue than a real problem. Other than that I can't think of anything on the negative side. recurse
I had to fix, but that was done needing only very little effort. last @
is likely to produce an unexpected result when executed during compilation, but I like to hide such interna from user code anyway. header creation in my case is a pseudo-op, inserted into the instruction stream, with word name as argument.
The general answer to "why not use just another name" is that a system is either compliant or not compliant
Actually, a standard system may have different degrees of compliance (see 5.1.1 System compliance).
And it's easy to make a system compliant: it's enough to provide the Core word set only (see 3 Usage requirements). Concerning other standard words, a word should be either provided and compliant, or not provided.
Character strings
due to wanting to avoid
c-addr u
type strings altogether, or as much as possible: dealing with that type of strings - each char stored as it's ASCII value in a single array entry - is rather inefficient in bash,
It looks like Forth in Bash is not about efficient at all ;)
Standard character strings can be provided for compatibility only. A Forth system may use any other representation of strings for its internal use and/or in additional APIs.
"It looks like Forth in Bash is not about efficient at all ;)" - The more the reason to not substantially slow it down even more if it can be avoided - it may just make the difference between "usable" and "unusable"
what are your experiences with this approach of reversing the order of compilation and header creation? Any problem points you have encountered as far?
Usually the lifetime of a parsed string continues until the next refill, and in such a case you need to save the string containing the name of a word somewhere up to ending compilation of the word. As an option, a header can be created at once, but appending into the compilation word list can be delayed.
There is no such a problem if a parsed string lifetime continues until the whole file is translated (as it was in one my case).
My biggest issue with this is currently this: every time a header is created, the file handle and line number of the file it was loaded from is recorded, for easily accessing the word source for editing or viewing. At this point the recorded source location of colon words is the line carrying the semicolon, not the colon.
Then probably the line number should be taken just before start compilation of the definition. I would associate this line number with xt to provide this information for anonymous definitions too.
recurse
I had to fix, but that was done needing only very little effort.
recurse
doesn't depend on the header, it only needs the xt of the current definition.
I involve a word germ
that returns this xt, and recurse
is simple as:
: recurse ( -- ) germ compile, ; immediate
See a full example in my gist. It also shows one way how to deal with the name and create the header after end of compilation of the definition.
Then probably the line number should be taken just before start compilation of the definition.
Something similar I'm now doing - defining words inject another pseudo op into instruction stream, with file handle and line number as arguments. Source location information is then saved from these arguments, instead of produced when header is created.
recurse doesn't depend on the header, it only needs the xt of the current definition.
colon words in yoda don't have bodies. they also don't have execution tokens in the common sense. In fact, there are neither name- nor code- nor parameter field addresses, and here produces initially an address below 10 (for a handful of variables). What those use as xt is a numeric portion of the function name associated with a word name, which is entirely unrelated to any memory address, therefore is referencing a word internally not based on an xt. xts are only "pretend-xts" to make words like ', execute and the like functional.
The central reference to a word is actually the word name, often used as hash key into an array, As a consequence did recurse depend on the header.
Chances are that not everything what applies to other Forth- and Forthlike systems is directly applicable to yoda. My recurse looks now like this:
code "${functionname_prefix}_$((nextname))"
, resulting in code like
In fact, there are neither name- nor code- nor parameter field addresses
It's OK. All these artifacts are implementation details that are under the hood. The standard is a quite high-level abstraction that hides all such details.
What those use as xt is a numeric portion of the function name associated with a word name, which is entirely unrelated to any memory address,
An execution token is not an address. It's an unspecified cell that only identifies execution semantics, and nothing more (see also Data types).
1365
is a perfect execution token in your example above.
As we can see, Tick ('
) returns xt with no doubt, and execute
performs the corresponding execution semantics:
: bar 123 . ;
' bar execute \ prints 123
What is missed is the compile,
word. It can be defined as follows:
primitive 'compile,' 'code "${header_code}_${s[sp--]}"' ;
Now recurse
can be defined on the Forth level as:
: compileonly immediate ; \ compat
: germ ( -- xt ) last @ ;
: recurse germ compile, ; compileonly
BTW, even the core s"
can be defined in your Forth as:
: lit, ( x -- ) ['] literal execute ;
: s" [char] " parse$ here unpack$ here over allot lit, lit, ; compileonly
Almost ... but, this wouldn't have worked as you wrote it in versions which postpone headers. It seems that you're basing this on a version of yoda from before header creation was postponed. last
was updated by header
, which was fine as long as headers were created before code was compiled. When order was reversed, last
wasn't updated prior to compiled code, and pointed to another word than the most recently defined one when read during compilation.
This has been changed now (version 0.6.2) and has been put online only a few minutes ago - that quirk was mentioned in an earlier post in this thread but not deemed important enough to fix this quickly. As your code examples rely on proper contents of last, I did the fix and upload of the corrected (hopefully) version now.
It seems you took a good look at yoda, given your aptness of dealing with its pecularities.
As your code examples rely on proper contents of
last
, I did the fix and upload of the corrected (hopefully) version now.
I relied on last
just as an easy solution for an illustrative purpose only (yes, it worked in an earlier version).
Actually, germ
should be proper implemented to return the xt for the current definition only (the current definition is the definition whose compilation has been started most recently but not yet ended).
When you provide literal
, compile,
(and/or postpone
), you probably would want to throw an exception if the user try to compile something via these words when the current definition is absent. The last
cannot help with this. Also, if you provide quotations, a definition can be the current definition several times. The conception of last
cannot proper reflect this idea too.
A proper implementation (and handling) of germ
solves all these problems.
compile, (and/or postpone), you probably would want to throw an exception
yoda supports forward references. for compile, and postpone, the call to a function can be compiled while the function doesn't exist yet. When defined later, the name of the compiled but declared "still missing" function will be used for naming the resolved word.
This is possible by either automatically created forward references - those are by default disabled, look at +f and -f "convenience" switches to turn those on and off), or manually supported, by declaring a not yet existing word as needed through need word
which creates the word header and function name for assigning when the word gets resolved anytime later.
It should only make sense to allow postpone, compile et al to benefit from this convenience too.
Forward references in compiling are slightly different from immediately resolving words when referenced during execution (which is, by default enabled - convenience switch +i and -i, also indicated by flags display on statusline by capital vs lower case letters). Latter do throw an error if not resolvable right away, so calling those "forward references" would actually be a misnomer, even though they share code with compile time ("real") forward references.