bcompiler icon indicating copy to clipboard operation
bcompiler copied to clipboard

How to generate hex1?

Open sandinmyjoints opened this issue 7 years ago • 12 comments

The hex1 binary is supplied, but how was it generated from the ascii of hex1.he? The writeup explains how to verify that hex1.he is a representation of hex1, but I'd love to known how I could start with only hex1.he and generate hex1 from it. It must involve writing bytes directly to a file. Is there a way to do that with just a text editor? Or a tool like dd?

(I know that this is a mirror, but the original site is not loading and I thought someone out there might be able to answer this.)

sandinmyjoints avatar Jan 26 '18 03:01 sandinmyjoints

Here's a way to do it with Python:

#!/usr/bin/env python3

import binascii
import re
import sys

with open('hex1.he') as f, open('hex1-new', 'wb') as fout:
    for line in f:
        line = line.strip()

        accum = []
        for c in line:
            if c == "#":
                break
            elif c == ' ' or c == '	':
                continue
            else:
                accum.append(c)

            if len(accum) == 2:
                byte = ''.join(accum)
                fout.write(binascii.unhexlify(byte))
                accum = []

I guess the key parts there are opening in write binary mode (wb), and using unhexlify to get bytes from ascii characters that do not represent the ascii characters, but rather the hex values written using the ascii characters.

Would still like to know of other ways to do it, using common command line tools.

sandinmyjoints avatar Jan 26 '18 04:01 sandinmyjoints

sed 's/#.*//' hex1.he | xxd -r -ps > hex1_
diff hex1_ hex1

lauriro avatar Mar 08 '21 22:03 lauriro

Thanks @lauriro. We should mention this in a comment in the bootstrap script. I don't know if it should be executed by default, as it depends on sed and xxd. Do you want to submit a PR?

@sandinmyjoints my apologies I totally forgot about your question. Thanks for asking.

Is either of you interested in figuring out how to finish this job and actually bootstrap a simple C compiler?

The only other effort I know of is https://www.bootstrappable.org/, although they seem to start from a VM, not machine code like bcompiler.

certik avatar Mar 08 '21 23:03 certik

It would be fun but I'm afraid it's too much effort. I have to wait a vacation when kids are older (and ready for machine code) :D

lauriro avatar Mar 09 '21 07:03 lauriro

@lauriro same issue here, that's why I didn't do it yet. :) But I thought a lot about it over the years. One way to do it is go the other way, and bootstrap gcc with tcc (the https://www.bootstrappable.org/ has done that or will do). Can you then compile tcc with a simpler C compiler? What is the simplest self-compiling subset of C? I found several interesting candidates:

  • https://github.com/rswier/c4
  • https://github.com/Fedjmike/mini-c
  • https://github.com/rui314/8cc
  • https://github.com/rui314/chibicc
  • https://github.com/aligrudi/neatcc

The idea of using a small subset of C is that you can use any modern C compiler (such as clang or gcc) to develop and debug and people are familiar with the syntax and the language (even if it is just a small subset).

Then once you can bootstrap from a very small self-hosting subset of C all the way to tcc, then we can think about how to bootstrap this small self-hosting subset of C from bcc, the final language in bcompiler.

certik avatar Mar 09 '21 13:03 certik

I am also in a wait-until-kids-are-older situation. I don't know C or its ecosystem well enough to have an answer to your question, but I'll keep an eye out!

@lauriro Thanks for the answer to my original question. I wasn't aware of xxd!

sandinmyjoints avatar Mar 09 '21 15:03 sandinmyjoints

I posted "What is the simplest self-compiling subset of C?" here: https://news.ycombinator.com/item?id=26399740 If you happen to have accounts there and can upvote it, it may get enough attention to receive some useful answers.

sandinmyjoints avatar Mar 09 '21 15:03 sandinmyjoints

I just did --- I clicked "favorite", is that how you upvote? I also posted some comment there to get started.

certik avatar Mar 09 '21 15:03 certik

Almost -- upvoting is done by clicking the little gray up-pointing triangle icon to the left of the post's title.

sandinmyjoints avatar Mar 09 '21 15:03 sandinmyjoints

Ah ok, I just clicked on it!

certik avatar Mar 09 '21 15:03 certik

https://github.com/8l/cc500 one more

lauriro avatar Mar 09 '21 21:03 lauriro

@lauriro the cc500 looks awesome, and it is by the same author who wrote bcompiler! Probably not a coincidence. It is simple, readable and something that I am quite confident could be used to extend to implement more C features until tcc can be compiled. The author even outlines how that would be done in the README.

Going the other direction, simplifying cc500: it looks as simple as it can get, I only noticed a few minor things that could probably be removed while still being self-compiling (removed from the source code and from the compiler supporting it):

  • comments
  • some operators such as != or >, possibly the bit operators could be equivalently expressed with arithmetic operators and if statements
  • rewrite the compiler to only use int (no char)
  • remove malloc and have a fixed (large enough) pre-allocated memory

I can't think of anything else that could be removed. This is pretty close to minimal. It is much shorter than the bcc code.

To bootstrap it, after the simplifications above, one would have to rewrite each subroutine into an equivalent code using bcc. One can tidy the C code to make it closer / easier to express using bcc. This might not be that hard.

I expect to be much harder to actually extend cc500 to compile tcc, but that I believe can be done. Conceptually that is believable.

certik avatar Mar 10 '21 00:03 certik