Log
I think I'm going to log here from time to time about the development. Just now, I'm a bit upset. This might be funny for most of you. But I'm really concerned about bloating. A few bytes might not seem so much.
But .. I added a global structure. Suddenly - even the tiniest "hello world" bloated up to 4k. And, you can feel the difference, it simply loads a few microseconds longer, when executing. It's feelable. And, if you think a bit bigger, multiply this with a few billions for a server - this would also count in hard cash.
So. Atm. I'm going to think about it. There has to be a compromise. But the last compromise, the global structure, initialized at start - I believe, it doesn't need to bloat so much.
Maybe something like a lazy initializer would be better. Sort of mallocing the global structure and buffer, only when needed.
It's also a question of the increasing complexity. Although it's a miniature lib, already adding something here automatically changes something there... You know the game. I never thought so much about simplicity like now. It's just an unusual programming style, in nowadays world of Java and whatelse object orientational ways. You just don't think this way, normally. Even when system programming. Even at assembler level. So - it's also much fun, implementing this minilib. Just because it's unusual. And there is much abstraction to do as well - only in a strange way.
Ok. Current state (amd64, linux): hello-world: 185B. that's Ok again. It has been down to 150Bytes before, but at the moment these 35Bytes are not the highest priority.
A sort of malloc is implemented.
Darn! I really do need a decent disassembler. elftools and binutils don't even read the binaries. (I guess, because I'm stripping all sectionheaders; but what do we need section headers for, when there's only the section text..)
Ah. ;) There's an online disassembler ( https://onlinedisassembler.com/odaweb/ ) Got my hexdump: hexdump hello-include | perl -pe 's/^\S*//' Output given bellow..
hexdump hello-include | perl -pe 's/^\S*//' 457f 464c 0102 0001 0000 0000 0000 0000 0002 003e 0001 0000 8090 0804 0000 0000 0040 0000 0000 0000 0000 0000 0000 0000 0000 0000 0040 0038 0001 0000 0000 0000 0001 0000 0005 0000 0000 0000 0000 0000 8000 0804 0000 0000 8000 0804 0000 0000 00b9 0000 0000 0000 00b9 0000 0000 0000 0001 0000 0000 0000 01b8 0000 4800 358d 0027 0000 0dba 0000 8900 0fc7 3105 c3c0 485f e689 8d48 fe54 e808 ffda ffff 8948 48c7 c0c7 003c 0000 050f 48c3 6c65 6f6c 7720 726f 646c 0a21 0000
finally. Compiled with other options, objdump did its job. objdump -D hello-include
hello-include: file format elf64-x86-64
Disassembly of section .text:
0000000008048078 <.text>:
8048078: b8 01 00 00 00 mov $0x1,%eax
804807d: 48 8d 35 27 00 00 00 lea 0x27(%rip),%rsi # 0x80480ab
8048084: ba 0d 00 00 00 mov $0xd,%edx
8048089: 89 c7 mov %eax,%edi
804808b: 0f 05 syscall
804808d: 31 c0 xor %eax,%eax
804808f: c3 retq
8048090: 5f pop %rdi
8048091: 48 89 e6 mov %rsp,%rsi
8048094: 48 8d 54 fe 08 lea 0x8(%rsi,%rdi,8),%rdx
8048099: e8 da ff ff ff callq 0x8048078
804809e: 48 89 c7 mov %rax,%rdi
80480a1: 48 c7 c0 3c 00 00 00 mov $0x3c,%rax
80480a8: 0f 05 syscall
80480aa: c3 retq
80480ab: 48 rex.W
80480ac: 65 6c gs insb (%dx),%es:(%rdi)
80480ae: 6c insb (%dx),%es:(%rdi)
80480af: 6f outsl %ds:(%rsi),(%dx)
80480b0: 20 77 6f and %dh,0x6f(%rdi)
80480b3: 72 6c jb 0x8048121
80480b5: 64 21 0a and %ecx,%fs:(%rdx)
Now, I'm really wondering why the heck ld places the entry point behind the main function. That's .. hm. ( main starts in the listing at ..oh gosh. anyways. its at the top - 8048078 ) the entry point is at 8048090 so - we call main from 8048099, just to return then from 804808f to 804809e This is. Ok. Finally we do a call from _start to main. I haven't figured out yet, how to avoid this call. Just putting a _start before main, and _end after would be faster and save again a few bytes.
an _end function behind main would not only be faster and save a few bytes, it would be saver. (No crash, when someone does to much fiddling within main. Just return to the os with a perhaps unusual value. While writing: just need to push the address of _end to the stack, just before entering main. No call to main. So, the ret within main will do the jump to _end. Ideally just one byte, if ret is callen at the end of main
And again, while thinking about it: I'm going to try a jmp to main at the end of _start. Hopefully, the compiler will optimize right.
Nice reading: https://en.wikibooks.org/wiki/X86_Disassembly
That's typical. The linked wikibook - most things are quite basical. The things, I'd need to know - aren't explained.
Anyways, nice sentence there: Computer science professors tell their students to avoid jumps and goto instructions, to avoid the proverbial "spaghetti code." Unfortunately, assembly only has jump instructions to control program flow. grin. Obviously, that's right. And, as I always said: A good programming language shouldn't restrict you. I don't get, why most languages have banned the good old goto. If the code I produce is bad - it's about the way I think. A programming language is nothing more than a sort of education for your thoughts. And a restrictive education is not the best education.
..without any problem we could write object oriented programs in Assembly. or functional. It's just hard to write lowlevel from a highfunctional language, the other way. Cause of restrictions. I personally really do love perl cause of this. Simply, because there aren't any imposed restrictions. (Not before 5.10)
And the biggest perl project I wrote did have around 50.000 loc. Nicely, there didn't be a real performance bash. Just the startup took a few seconds, finally the scripting expressions have to get compiled. After this, a few critical routines I benchmarked - Rewriting them in c most times didn't give a real performance gain. Rewriting in Assembly has even be sort of critical. Sometimes there has been a performance gain of around the factor two, but when not really careful, often there's been a serious penalty. My conclusion has been - just keep writing in perl. It's very seldom, a factor of two gives you a real performance gain. Even in a high performance environment. The problems are in nearly all cases at another place.
Ok. Now - back to coding. I guess, I really have an important milestone nearly accomplished - Having a good base, and sorted things out. It might soon be possible to use minilib as a plugin replacement for most tools. Just by changing the gcc compiler switches. No code change needed.
Oh. And for now, I should go shopping. Need something to eat, and the malls are closing within an hour here. Maybe I should even buy some drinks, I somehow have the feeling, I can celebrate the development (in its positive sense) of minilib. Why not praise myself ;)
And, although I have other important projects - I really do want to accomplish the milestone 0.1 - a solid basic structure. Getting the whole thing to compile on further architectures, adding more ansi c functions, is not so complicated then. but could mark 0.2 0.3 would then be a complete ansi-c set. Just to point out a roadmap
(you might be interested in https://github.com/arsv/minibase which offers a built-in "libc" using similar approach)
(you might be interested in https://github.com/arsv/minibase which offers a built-in "libc" using similar approach)
Thanks a lot, that really is a good hint. Somehow didn't find it. ;) Although it even has a similar name.
I have to take a close look. It seems to me, minibase has a different target. But that's only a feeling yet. I have to think about the similarities and differences. One difference might be my approach of a header only library. (More exact, the possibility via compile switch) Also, the licenses differ. Which, perhaps, is not only a philosophical question.
It's a great reference anyways. And there seem to be further similarities, sort of a basic Linux system I'm also planning. Although this I'm going to clearly separate from minilib.
Likewise, this seems to me a philosophical question: Which way ist better, the monolithic, or the micro(lithic) approach.
I guess we somehow haven't realized yet, what the modern information technology could change. And although one might think, the monolithic approach is more stable - This doesn't seem to prove.
@rofl0r , may I ask, why did you point out the ability of hardcore-utils to be built standalone?
This seems to me right and important, but I'm not able to pinpoint the reason exactly.
There seem to be many reasons which speak for both sides - monolithic and micro approach.
As I pointed out in the readme, security seems to me a reason for a micro approach.
As well as simplicity. But there still have compromises to be found, so it's sort of hard to define.
Other reasons are less complexity (good) and more stability. ( Not everything is broken, when something breaks. Only the affected micro part).
But.. hard to sort this out.
Possibly, because it's hard to define,
where the border between a monolithic and a microlithic approach exactly is.
Anyways, thanks for the good hint. Best wishes, Michael
@rofl0r , may I ask, why did you point out the ability of hardcore-utils to be built standalone?
imo, it's way more convenient for both building and debugging: gcc foo.c -o foo et voila.
i like busybox' approach too, but it's quite hard to get a debuggable build and find a proper entrypoint for debugging (though i guess in all fairness this could be counted as a quirk of the build system). the highly integrated approach of busybox also makes it relatively hard to study the source code, and the wall of ifdefs for minimal size/option tweaks makes it even worse. having the whole set of unix tools in a single ~800KB'ish executable is a really nice property, but otoh i don't really care whether my join program is a 30 KB executable vs adding only 10 KB to busybox, when i need to install 100+MB of libs and binaries for a webbrowser.
@rofl0r , may I ask, why did you point out the ability of hardcore-utils to be built standalone?
imo, it's way more convenient for both building and debugging:
gcc foo.c -o fooet voila. That's sort of two-edged. The build itself might be more convenient with a single monolithic source file - IF everything works out. Debugging,..
i like busybox' approach too, but it's quite hard to get a debuggable build and find a proper entrypoint for debugging (though i guess in all fairness this could be counted as a quirk of the build system). the highly integrated approach of busybox also makes it relatively hard to study the source code, and the wall of ifdefs for minimal size/option tweaks makes it even worse.
I believe that's one of the important points. Just today I thought about the linux kernel - Although the sources are there, on my harddisk - In now way I could read through them. Meaning, the argument of more security by the open source is more or less hypothetical. Even if I would read through - What hides behind this or that macro I can't look up in every case. Oh, and I'm just getting remenbered to perl japhs... To be fair, afaik within the kernel development they separated the different responsibilities quite clear. So there, again, is some sort of micro development.
Possibly that's the real point - decreased complexity by clear targets of each single tool. This on the other hand increases the chances, others are able to contribute. And it's easier to understand, even if it's your own source code. Who can remember, what he did ten years before, without reading the sources. I'm eager to see, whether my approach works out. Having things separated, but compiling the sources into one single source file.
having the whole set of unix tools in a single ~800KB'ish executable is a really nice property, but otoh i don't really care whether my
joinprogram is a 30 KB executable vs adding only 10 KB to busybox, when i need to install 100+MB of libs and binaries for a webbrowser.
Strangely, having statically compiled binaries somehow seem to be more responsive. Shorter loading and execution times. Although that's more a sort of a feeling. I'm not so firm, but I guess it might have something to with context changes. When the libraries are loaded somewhere in the big ram of nowadays, and the program into quite another part - better have the whole executable loaded into the processors cache, than having to load this or that part of several libraries from the ram. Which might lead to further penalties, like broken cache predictions and so on.
Anyways, again thinking about that. This again seems to be a trade off between abstraction - like a option parser, which might be needed by most tools, so implemented once - and the single tools.
It's a bit to me, like I know a good solution - I even believe I have one - But I'm not able to pinpoint it.
Anyways, you are completely right with compiling and debugging. I busybox doesn't compile, most likely I'm going to try something other. Just to much work getting through the sources. If one single tool with a single source file makes troubles, most possibly I'm going to have a look into the sources.
But it still seems to me, there's one important point missing. Possibly some sort of conjunction of all the separated parts.
Strangely, having statically compiled binaries somehow seem to be more responsive. Shorter loading and execution times.
theoretically static linked binaries should be slightly faster for 2 reasons:
- no delay on startup due to the dynamic linker having to patch jump adresses for the linked libary routines
- no overhead due to -fPIC. this can make especially register-starved platforms like i386 a good bit faster, because iirc gcc usually keeps the plt address in a register, however this can now largely be mitigated with -fno-plt (see http://ewontfix.com/18/ for more details)
Strangely, having statically compiled binaries somehow seem to be more responsive. Shorter loading and execution times.
theoretically static linked binaries should be slightly faster for 2 reasons:
1. no delay on startup due to the dynamic linker having to patch jump adresses for the linked libary routines 2. no overhead due to -fPIC. this can make especially register-starved platforms like i386 a good bit faster, because iirc gcc usually keeps the plt address in a register, however this can now largely be mitigated with -fno-plt (see http://ewontfix.com/18/ for more details)
Well, it's not only theoretical. Including me, I'd say, it's sometimes hard to keep an eye on complexity. I remember my assembly experiments, when I tried to improve some basic functions. To my annoyance, sometimes it was really hard to beat the code generated by gcc. Very often, the results have been counterintuitive.
I guess, your first point, combined with the resulting cache misses, can get a bigger problem than one might think.
Your second point, again, is a good hint. Although lucid, I haven't been aware of it.
:laughing: Again, this complexity. Like what I did today (tonight) .. I'm not really sure what it was. But suddenly the "extremely tiny" editor compiled to a bloated something. instead of 15k, 2MB. (!) There still is something wrong, it was down to 8k. Just now I'm thinking I should give it a break, instead of implementing this or that, maybe it's time to think about howto get a grip of the complexity.
The "test" system is a good first step. But obviously not good enough. And not complete at all.
First I'm going to check for the position independent code. afaik gcc doesn't create position independent code, when compiling static. But you never now. Thanks again for the hint
Like what I did today (tonight) .. I'm not really sure what it was.
well, this should not happen when you use git which you do. a git diff can always tell you what's been changed since the last checked-in (and thus probably "known good") version.
before git came along, it was really hard to remember everything that has changed recently, when a regression happened...
Yes, it's been exactly what I did. ;) Now I still don't know, what exactly has been able to bloat 2MB into a tiny poor executable of 12k. I'm pretty sure, it's been the linker. :) Someone has to be blamed. But I haven't been able to pinpoint the problem. And since I've to cleanup the whole minilib, I'm better doing this first.
Obviously sometimes I don't see the obvious. Header only implementations, marked with "always inline" should be static. Else there is going to be trouble. Dunno, why I ripped the static some days before.
The bloating - seems related to the stack. Somehow the global struct forces to place the stack into a separate program header. That is adding around 50 to 60 bytes. Quite expensive for just one variable.
I guess I leave it this way for now, anyways.
.. seems the extra program header for the stack is only needed, when linking several object files. :thinking:
:rage:
Further readings reveal - eventually, it's possible to drop the .text section, instead write all execution instructions into the stack. Also, my point about security seams to be valid. Just found this site: https://blog.fbkcs.ru/en/elf-in-memory-execution/ - There are attacks described, which might be interesting especially at android. I can only guess - but somehow I'm sure, it is not only a theoretical possibility, someone infecting e.g. an android system. I even can't be completely sure, the sudden bloats I'm experiencing sometimes aren't related to an infection here. (Normally I wouldn't notice, but since I'm counting very single byte, ..) I'm working with a quite clean and fresh Arch amd64 installation here. But I'm also browsing the net with the same system. I guess, there is a 5 percent chance, that there is an infection. But I cannot say for surel. the binaries, I uploaded, might be ok. SInce I really count every single byte. Otherwise, it's not so hard disassembling a 200 Byte file and checking exactly. Which is what I'm going to do today with the bloated executables. I'm also getting back to another idea of mine. Having a core system, where everything is statically linked. I already linked e.g. the shell (zsh) statical. It's quite more responsive. But possibly, I should also link gcc, and so on statical. It's also about minimizing the possible problems. Also, when they are more or less hypothetical.
And, again, this seems to me a huge advantage of minilib, compared to glibc or even musl: It's not only not hard, reading and understanding the source of this minilib. It also is possible to disassemble the generated binaries.
.. Disassembling other binaries obviously is possible at well. But, who can say what's hidden in, eg. an "hello world", which shows up with 500k. Else with an 150Bytes executable, where the biggest part is simply the elf header. One should also keep in mind, that sometimes the disassemblers miss things. Like shifted bytes, or executable instructions within binary "data"
I'm still engaged with restructuring. Have written a small parser, to create the compat headers. Firstly it works. Secondly, shows up useful. But I made a fundamental design flaw. The header files are generated from templates, and the target files are overwritten. This seemed right in this special usecase. But, since this works out so well, I'd like to use the parser for further jobs. And there's the problem, that you can't modify the created header files directly. Instead you have to modify one of the templates, and rerun "make header". Which is annoying. So I'm heavenly tempted to write a small interpreter for this job.