zig icon indicating copy to clipboard operation
zig copied to clipboard

zig ar: a drop-in llvm-ar replacement

Open kubkon opened this issue 4 years ago • 19 comments

Since we are putting a lot of effort into implementing our linker for all supported targets (https://github.com/ziglang/zig/issues/8726), we should also put some effort into adding our own implementation of a static archiver to replace llvm-ar. While in general llvm-ar is working well on the host platform when targeting the host platform, in cross-compilation settings things can get wonky when the archiver will produce native static archive headers for foreign file formats possibly tripping the linker upon trying to use it.

This is a great issue for any new contributor as it allows you to create an archiver as a completely standalone program (in its own repo like zld for instance) and then upstream it into Zig once it's ready.

Also, with this issue closed, we will be able to offer zig ar as a subcommand that does not rely on llvm-ar in any way.

This issue does not block 1.0.

kubkon avatar Sep 23 '21 20:09 kubkon

I'm interested in giving this a look-in! Where would be a good place to start in understanding the problem?

moosichu avatar Sep 24 '21 12:09 moosichu

If I were tackling this problem, I would most likely create a fresh repo with sole purpose of building a drop-in replacement for llvm-ar, much like zld is for lld. You can then build it out-of-tree which means you don't have to worry about passing Zig tests at this stage and you significantly cut down on build times. I would also focus on just one file format in the beginning say ELF, or Mach-O. The idea here is that if you were to pick any large C/C++ codebase (or whatnot) in the wild, you could pass the Zig archiver as a replacement for the default system one or llvm-ar, that is you'd tweak the CMake/Make invocation like so:

AR=zig-ar CC=... CXX=... cmake ../

or the same but with make. If the build process succeeds, then success!

Afterwards, you might wanna consider dropping the archiver as a direct replacement for llvm-ar in the Zig upstream either by calling directly to a precompiled binary or putting the sources in-tree (the latter is the end-goal actually). The relevant source where this should/could happen is in src/link.zig#L668:

pub fn linkAsArchive(base: *File, comp: *Compilation) !void {
    //...
    const llvm = @import("codegen/llvm/bindings.zig");
    const os_type = @import("target.zig").osToLLVM(base.options.target.os.tag);
    const bad = llvm.WriteArchive(full_out_path_z, object_files.items.ptr, object_files.items.len, os_type);
    if (bad) return error.UnableToWriteArchive;
    //...
}

kubkon avatar Sep 25 '21 11:09 kubkon

Will give that a crack!

moosichu avatar Sep 25 '21 11:09 moosichu

JFYI, https://github.com/TinyCC/tinycc/blob/mob/tcctools.c has an extremely hacky impl for Elf support on Windows.

wqweto avatar Sep 25 '21 11:09 wqweto

Yeah - I've made a little bit of progress with this. Just still getting comfortable with zig so it's going to be a slightly idiosyncratic start. But it feels like a doable project.

https://github.com/moosichu/zar/

I just have a little program that can parse a very simple archive file and then prints out all the "filenames" of the files it contains.

I'm just building things up slowly step-by-step, with a focus on reading archives generated by llvm-ar to begin with.

The goal will be to make it a drop-in replacement, but will figure-out the order in which I do things as I go for now (still going to be very experimental early on).

I think what will probably end up happening is that I will experiment with parsing increasingly interesting archive files. And then I will loop back around and implement the command-line interface for the program. And then just incrementally work on each piece of functionality testing against the results of llvm-ar.

Will then probably make some kind of framework for testing those as well I think.

moosichu avatar Sep 25 '21 16:09 moosichu

You might want to look at https://github.com/SuperAuguste/zarc . It can parse ars, but it cannot create them. So maybe just using that and then adding features to create tars would be good?

g-w1 avatar Sep 25 '21 16:09 g-w1

Yeah, the ultimate goal of zar should be generating static archives. Adding parsing logic is a good first step to figuring out how it works though. One thing to pay particular attention to is the differences in generated ar structure between linux and macos - I believe there is a difference in at least the header format but maybe more. Also, I've been reached out to by multiple people expressing interest in helping out so @moosichu are you fine taking charge on this one and potentially collaborating with others? If so, I'll send them your way (to your fresh repo, etc.).

kubkon avatar Sep 25 '21 17:09 kubkon

Yes - very happy to collaborate and I’ve started reading up and putting sources together on the differences in the formats for different platforms! I’ve linked to some of them in the repo - but will flesh that out properly tomorrow as well for others hoping to contribute as well.

moosichu avatar Sep 25 '21 17:09 moosichu

I have also been doing my own independent attempt at this issue, here https://github.com/iddev5/zig-ar

My ar can create basic files so far, and it is compatible with llvm-ar and ranlib too.

If it works out, I can try merge it with zar as discussed above...

iddev5 avatar Sep 26 '21 04:09 iddev5

Great progress! Since you have two repos it might make sense to split the focus a little. For example, @iddev5 could focus on linux and @moosichu on macos, etc., and afterwards merge both together as zar or otherwise. How's that for a plan?

kubkon avatar Sep 26 '21 07:09 kubkon

This sounds great! Just to let you know that I am okay with either plans. On my repo, I have already got reading and writing common-style archives done (without symbol table and string table, ofc)

iddev5 avatar Sep 26 '21 07:09 iddev5

Hey I've been in contact @moosichu and @kubkon and wanted to join in. I'm on Linux, but happy to help out where I can!

ceckertz avatar Sep 26 '21 12:09 ceckertz

Sounds good - I think my repo needs to be fleshed a bit more before people can start making meaningful contributions (without stepping on each other's toes). I have made some good progress on argument processing though - and will keep chipping away at the problem today.

https://github.com/moosichu/zar/

But with the amount of interest expressed - I will try and fast-track to that point ASAP (hopefully this evening) including having issues that can be tackled by individuals (which I will be open for accepting PRs for). Especially as I will be working during the week so won't have anywhere near as much time to work on this then.

moosichu avatar Sep 26 '21 13:09 moosichu

Sounds good - I think my repo needs to be fleshed a bit more before people can start making meaningful contributions (without stepping on each other's toes). I have made some good progress on argument processing though - and will keep chipping away at the problem today.

https://github.com/moosichu/zar/

But with the amount of interest expressed - I will try and fast-track to that point ASAP (hopefully this evening) including having issues that can be tackled by individuals (which I will be open for accepting PRs for). Especially as I will be working during the week so won't have anywhere near as much time to work on this then.

Thanks for taking charge at organising this @moosichu, it's very much appreciated! If you need any assistance from me, please do let me know!

kubkon avatar Sep 26 '21 13:09 kubkon

The work of @iddev5 has been merged into the https://github.com/moosichu/zar/ repo. Thank you! @iddev5!

I need to properly read through the changes (and some cleaning-up needs to be done to make each of works consistent with each other). But it's a good step of progress for sure.

In terms of what I have done - I have the "print" and "display contents" ("p" and "t") operations working for both BSD & GNU-style files (although without support for symbol tables at this point).

Having looked at the problem - due to the slightly sutble ways the parsing of the different kinds of archives can overlap in functionality, it seems slightly better to structure the code around that & then slowly expand the functionality of which operations can be done on those files vs. completing everything for one kind of file and then adding another.

There's a couple of issues on the repo - mainly around cleaning up the merge & working on testing (something I haven't looked into at all). I've jotted my thoughts on how the latter could work if anyone is interested in that.

Progress has been fairly good so far overall I think! Lots still to do - and my time is going to be a bit more limited for the coming couple of weeks. But I will make sure to at least check the status of things every morning even if I can't work directly on the problem.

I did consider opening up the repo to others with commit access - but I think I might hold off on that as we can each probably work more quickly in our own repos (problems should be orthogonal enough), and I think it might be better if the project stabilises a bit first and things are a bit more coherent before then so that we are all on the same page before that happens. So I think we can see how things go with a PR-based model for now I think? If that doesn't work well I'm very open to reconsidering though.

My next focus for tomorrow morning (unless @iddev5 gets there first!) will be to look through the code that has been merged-in and to unify it into the rest of the code base a bit more concretely. But I won't be able to get on that until then, so if stuff is done on that in the meantime I will make sure to take that into consideration. Hopefully my comments (both in TODOs in the code and my write-up on the issue there help).

Otherwise @iddev5 feel free to just focus on expanding the functionality of what you already have (and if you create any PRs I will happily merge them). I can then sort out the unification side of things in the short term until things have settled.

moosichu avatar Sep 27 '21 08:09 moosichu

It seems there hasn't been much progress on the zig archiver from the looks of the repo and the last comment made on this thread.

I'd like to take over this task with some possible mentorship as I've never written an archiver.

Is that feasible for @andrewrk / @kubkon or any other individual knowledgeable in archivers?

Shinyzenith avatar Sep 10 '23 08:09 Shinyzenith

I've been actively working on it locally. Don't worry it's still going! Just slowly as I've not had a huge amount of free time recently.

moosichu avatar Sep 12 '23 13:09 moosichu

However! If you are keen/interested in joining the effort that would be more than welcome :) do let me know if you are interested and I will spend a couple of weeks getting it back into a contributor-friendly shape.

moosichu avatar Sep 12 '23 17:09 moosichu

I'm a bit confused about where this issue is going, so I don't know how to contribute.

zar was apparently abandoned in favour of a linker-centric approach a year and a half ago, but this issue still talks about making a standalone program.

Was this approach focused on emerald, or something else?

EnronEvolved avatar Jul 26 '25 17:07 EnronEvolved

Neither zar nor emerald are part of the Zig project.

The task is really simple. I see a lot of people here overthinking it. All you have to do is smash some object files together into the standard archive formats. If you end up with more than 2000 lines of code you're probably doing something silly.

andrewrk avatar Jul 26 '25 17:07 andrewrk

It looks like a lot of that overthinking is a result of the "replace llvm-ar" design goal. So, for the sake of clarity:

  • What features of llvm-ar are used by the Zig project? Do we just need to be able to create archives?
  • Does it have to be CLI-compatible with pre-existing tools?

EnronEvolved avatar Jul 26 '25 17:07 EnronEvolved

I was working on the "archive writing" part of zar. I was initially focused on generating just valid archive files and not byte-by-byte compatible ones. But the general focus moved more onto creating identical files in order to make it easy to test the generated files (against other tools)

iddev5 avatar Jul 27 '25 14:07 iddev5

Realised it's not mentioned here so thought I give a status summary - I stopped work on zar as at the time Jakub was going to integrate the functionality directly into the linker backends and wrap those through a CLI interface iirc. But thinking about it - as Andrew said the problem is simple enough that it should be fine to have duplicated functionality in a standalone program, and maybe just having a standalone implementation be its own seperate thing is the best way to go.

In terms of approach - we were going for byte-for-byte compatibility with llvm ar to make it a rock-solid drop-in replacement.

I probably won't have time in the short/medium now to pick this up again (will focus on smaller issues), but please do feel free to use zar as a reference/pick up the pieces. I don't think it would take that much to get it over the finish-line per-se (as we managed to get redis building with it after all).

moosichu avatar Aug 02 '25 06:08 moosichu

Have changed my mind and have decided to try to see this through to completion (with no expecations on it being merged - just as a small self-contained personal side project). Going to focus on simplicity - as Redis was already successfuly building using zar as the archiver - hopefully it's not far off. The new fuzzing tool added look like they will help with robustness as well.

The long and short of this is - I'm working on this again, but if I'm non-responsive or anything assume I'm too busy to continue work. Feel free to fork what I've done regardless (if that would be helpful), otherwise will keep an eye out for PRs and try to keep progress going forward (although slowly for now!).

moosichu avatar Aug 10 '25 14:08 moosichu

@moosichu fyi I'm willing to give it a shot as well. Creating an archive without indexing, just headers and files, should be simple enough. This POC creates an empty archive which is accepted without errors: https://git.sr.ht/~noneofyourbusiness/zig-ar @ ec1afc40 zig build && ./zig-out/bin/zig_ar > libempty.a && /usr/bin/ar -t libempty.a or even with members: @ ceba8126 zig build && ./zig-out/bin/zig_ar rcs libfoo.a *.o && /usr/bin/ar -t libfoo.a

GNU ld errors out without an index - tinycc accepts it just fine definitely something desired for interop with C

CorruptedVor avatar Aug 11 '25 13:08 CorruptedVor

done, it always generates an index the cli a bit of a hack, otherwise it should be fine clang, gcc, zig can use it to compile successfully

asciinema demo

CorruptedVor avatar Aug 12 '25 13:08 CorruptedVor

@EnronEvolved kindly did the work to update to build & run https://github.com/moosichu/zar with zig 0.14.1 (which was quite a substantial amount of work), and after refreshing myself with a few bits the CI actions that test building redis with zig ar as a drop-in for llvm ar work again: https://github.com/moosichu/zar/actions/runs/17007232528/job/48218502762

I'm going to try and be a little more pro-active and paying attention to contributions. There's a fair few eccentricities to the way that llvm ar (in particular) behaves that I've documented in the repo. And once things are closer to potentially shipping it would be good to hash out which behaviours we do/do not want to replicate.

There's a fun bug (if I'm remembering it correctly) where the version of llvm ar that ships the zig compiler binary for macOS actually defaults to the incorrect behaviour (i.e. gnu host) (I think because it's cross-compiled?), that differs from what you get if you build zig natively on macOS. I'm going off old comments that I wrote (4!) years ago now though, so need to refresh myself on that (to validate that is the case). This caught me out when testing locally as I normally test locally with zig built from source, but due to sticking to 14.0.1 for now due writer-gate I was testing with a binary distrubtion and forgot that a special buils flag that I setup was needed for that 😅 .

However - the fact that this is in the shipping version of zig (and seemingly hasn't caused any problems?) makes me think that byte-for-byte binary compatibilty is probably not a worthwhile goal to strive for - so will stick to the simpler problem of actually going for implementing the 'spec' (as much as there is one, due to all the complicated and esoteric ways ar behaves on each platform due to decades of legacy). This should help reduce the code count significantly.

moosichu avatar Aug 16 '25 10:08 moosichu

Neither zar nor emerald are part of the Zig project.

The task is really simple. I see a lot of people here overthinking it. All you have to do is smash some object files together into the standard archive formats. If you end up with more than 2000 lines of code you're probably doing something silly.

@andrewrk can you clarify? 'zig ar' exposes all features of llvm-ar, as far as I can tell do we only need a subset? for example just POSIX? maybe even less, like creation of archives (with symbol table)?

"All you have to do is smash some object files together into the standard archive formats." sounds like the latter option of just creating archives

CorruptedVor avatar Aug 18 '25 06:08 CorruptedVor

'zig ar' exposes all features of llvm-ar, as far as I can tell do we only need a subset? for example just POSIX? maybe even less, like creation of archives (with symbol table)?

FWIW - unless Andrew says otherwise, I'm assuming the issue as-written is the problem that needs solving. But some clarity would be appreciated.

All you have to do is smash some object files together into the standard archive formats. If you end up with more than 2000 lines of code you're probably doing something silly.

I mean I agree that if that is what the problem was, more than 2000 lins of code would be silly. But I disagree that is the problem this issue describes. I think?

We have started breaking-out all the various TODOs that were squirreled away in the zar code base out into actual issues (https://github.com/moosichu/zar/issues), hopefully something that will start to give a measure of how much work is remaining.

I don't know if it is overkill, but currently the goal here is to be a very robust llvm-ar drop-in replacement, where the behaviour of llvm-ar is matched for a given set of input arguments/modifiers. Byte-for-byte output matching is the strictest form of that - and as that's the simplest thing to test automatcially (as zig bundles llvm ar already) that's why it's going for that.

moosichu avatar Aug 20 '25 09:08 moosichu