elpaca [Feature]: package version lock files

Feature Description

Pinning packages makes configs reproducible. Currently, the only way to pin packages is to get the current hash for every package and add it to the appropriate use-package block.

I suggest implementing a new alist, elpaca-package-hash-alist that holds the hash for every package. Elpaca would be able to generate this alist automatically, so users could effortlessly set elpaca-package-hash-alist to this value upon startup. Elpaca would check these values and act accordingly (pull / reset / do nothing, etc). Every package with :elpaca t would be affected, and there could be a user-option for enabling this behavior.

Also, maybe this alist could be written to a .elpaca-pins file or similar?

Example:

Set PACKAGE-NAME to HASH

(add-to-list 'elpaca-package-hash-alist (cons PACKAGE-NAME HASH))

HASH could be set to t to signal an upgrade.

Confirmation

[X] The feature I'm proposing does not already exist in Elpaca

Jun 29 '23 14:06 precompute

Hi. Thanks for the suggestion. What you're suggesting is referred to as a version lock file.

There is currently the elpaca-write-lockfile function which will create an item menu with the current recipes for each package. For example, (elpaca-write-lockfile "/tmp/test.eld") produces the following for data:

((elpaca :source
   "lockfile" :date (25757 57844 206472 188000) :recipe
   (:protocol https :inherit t :depth 1 :repo
              "https://github.com/progfolio/elpaca.git" :ref
              "272966b864db86604535bced55b3dfa3c7ed8532" :pre-build
              ("git" "remote" "set-url" "origin"
               "[email protected]:progfolio/elpaca.git")
              :files (:defaults (:exclude "extensions")) :build
              (:not elpaca--activate-package) :package "elpaca"))
              ;; Other packages omitted
              )

The full recipe is stored with a computed :ref recipe keyword. What's missing is a way to rebuild packages "from scratch" so the package can be reset to that state. It's trickier than storing a commit ref, though. We'd want to disable inheritance for that recipe, possibly override the :depth keyword, etc.

Pinning packages makes configs reproducible.

Lock files work so long as the upstream source still has the commit referenced in the lock file available. However, if the upstream disappears or overwrites history, the ref is useless.

What I've been experimenting with is keeping the entire Elpaca package store in a repository. This has the benefit that the entire source code of each library is available despite what happens upstream. It can also simplify the machinery around restoring package state (by being a thin layer over git). The trade off is that a package store repository is obviously larger on disk than a lock file, but I don't think the difference is significant if you consider that the lock file doesn't do anything on its own (it would have to be used to download all the repos, anyhow).

I've experimented with a few backup strategies and I haven't decided which I'll end up with for Elpaca. I may design it in a way where one could sub out for their own strategy as well.

Related issues: #24 #36

Jun 29 '23 20:06 progfolio

I really like the idea of keeping the entire Elpaca package store in git. Curiously, I think it could even make Elpaca usable in environments without access to git (like the new Android port of Emacs, for example). You would just download main.tar.gz from your package store repository on GitHub or other git forge.

I'm interested in Elpaca and I think it has a lot going for it, but I'm not willing to switch to another package manager until it can match straight in reproducibility. Keeping an eye on this issue.

Jul 15 '23 18:07 axgfn

I'm not a fan of the entire Elpaca store in git.

I absolutely do need a lock file as it would give me a known good config.

I have 4 workstations that I use emacs on with the same git directory holding the config only.

I know some people check their elpa directory in or rsync but I don't like the idea of that for many reasons.

Despite using git and git-annex extensively I've never wanted this solution and resist having another pile to move around.

I'd like a normal Cargo.lock (very successful for rust) style text file that I could check into git, easily inspect and edit(say if someone rarely delete a remote git) in emacs or vim without doobedydeeing around in git to fix or alter things.

I don't think archiving other peoples git trees is a good problem for a package manager to solve.

Any binary files will start to explode the size of the elpaca git blob.

Without storing binary artifacts the benefits some people imagine won't actually be there.

Jul 22 '23 20:07 hammerandtongs

Any binary files will start to explode the size of the elpaca git blob.

Not many packages include binary blobs. Are there specific packages which come to mind? It would be good to know so I can build a pessimistic test case.

Without storing binary artifacts the benefits some people imagine won't actually be there.

My hunch that this scenario is even rarer than git repos disappearing or history being rewritten. A lockfile does not guarantee the presence of any system binaries either, so it's a shared flaw between both approaches.

There are trade offs between both approaches and I plan on making things flexible enough to accommodate either.

Jul 23 '23 16:07 progfolio

I'm not a fan of the entire Elpaca store in git.

I absolutely do need a lock file as it would give me a known good config.

I have 4 workstations that I use emacs on with the same git directory holding the config only.

I know some people check their elpa directory in or rsync but I don't like the idea of that for many reasons.

Despite using git and git-annex extensively I've never wanted this solution and resist having another pile to move around.

I'd like a normal Cargo.lock (very successful for rust) style text file that I could check into git, easily inspect and edit(say if someone rarely delete a remote git) in emacs or vim without doobedydeeing around in git to fix or alter things.

I don't think archiving other peoples git trees is a good problem for a package manager to solve.

Any binary files will start to explode the size of the elpaca git blob.

Without storing binary artifacts the benefits some people imagine won't actually be there.

Yes, I do agree. The cargo.toml style package version control system is good enough in 99% scenario.

In case the upstream package has changed, since it is something that less usually happens, the user can just manually switch the package upstream or just reset the upstream to a fork with its local copy.

In my own perspective, the approach of maintaining an entire Elpaca in git is not a good practice for source management. I seldom see any projects will include the src of the third packages into their own source code. Besides, the git size will grow rapidly.

I currently use straight, and the directory ~/.emacs/straight/repos has a size of 391M, and it is just a snapshot. Thinking about if you want to manage this folder into source management system, how large your .git will become?

Using git submodule will point to same pity: once the upstream changes, you can also not initialize all the submodule in a fresh install.

Jul 29 '23 17:07 milanglacier

My ~/.config/emacs/straight/repos/ directory is 713M, but ~/.config/emacs/straight/build/ is only 16M when I exclude .elc files. That's more like what I imagined would be tracked in git. I'm also not worried about it ballooning too much in size over time. Git is pretty good at compression, and we'd only need a new snapshot for each time packages are updated, which I expect for most users is only a weekly or monthly chore.

Jul 30 '23 00:07 axgfn

@ajgrf, if I'm not mistaken, the straight/build directory usually just has the compiled .elc files, other binaries and build output, and symlinks to the original source .el files in straight/repos. The symlinks are negligible, and you're excluding the .elc files; that leaves only other miscellaneous binaries in your measurement. Needless to say, it's not enough to just track those if you want working packages.

Aug 20 '23 05:08 roshanshariff

I'll throw my vote for a simple lockfile.

While the idea of having all your packages safely stored in case Github blows up sounds tempting, I see it as a solution to a problem I don't have. But in the most realistic case of a ref or even a complete repo disappearing, the first thing I'd be looking into is fixing the situation, finding a new package or otherwise deal with the problem. I use elpaca to install packages, that is (more or less) maintained packages, I don't need it to deal with dead code that once was a package. Worst case scenario I'll dig it out of elpaca/repos and make my own "package". What I want is to being able to say "this doesn't work. I know it worked last week. Please start up the Delorean." And if it could integrate well with git bisecting my init.el, that would be swell.

As for the size thing.

~/.c/emacs ▶ du -hcs straight/*
0       straight/bootstrap.el
47M     straight/build
516K    straight/build-cache.el
4,0K    straight/modified
2,9G    straight/repos
16K     straight/versions
2,9G    total
~/.c/emacs ▶ du -hcs elpaca/*
32M     elpaca/builds
28M     elpaca/cache
506M    elpaca/repos
565M    total

They're not entirely equal, there's been a bit of package churn since I switched to elpaca, but they're the same ballpark. Obviously elpaca saves quite a bit by doing shallow copies, but it's still 500M to save the source for the build.

I'll admit to being the type that's not afraid to mess around with the source in repos when trying to fix bugs, and then forgetting about getting the changes anywhere. How would storing the packages in git deal with local modifications?

Sep 08 '23 21:09 xendk

How would storing the packages in git deal with local modifications?

The state of the repos and builds directories are stored as is. So if you have local modifications, they would be stored.

Sep 09 '23 14:09 progfolio

So if you have local modifications, they would be stored.

So how does one tell what is local modification? I assume the history of the individual repo directories isn't part of this.

It this basically the same as adding elpaca/repos and elpaca/builds to ones .config/emacs repository (with some magic to avoid submodules for repos)?

Sep 10 '23 18:09 xendk

Thomas Fini Hansen @.***> writes:

So how does one tell what is local modification? I assume the history of the individual repo directories isn't part of this.

The git history of each repository would be preserved as well.

It this basically the same as adding elpaca/repos and elpaca/builds to ones . config/emacs repository (with some magic to avoid submodules for repos)?

It's a similar approach, but the entire store would be in its own repository instead of added to one's config repository. There would also be a minimal API around it so you don't really need to know how to use git to use it. e.g.,

User executes M-x elpaca-backup. They're prompted to take an optional note for the back up (the commit message). The entire store as is is committed to the package store repository in a way that avoids submodules.
User executes M-x elpaca-restore-backup. They're prompted to choose a backup point (which is just picking a commit). The store is checked out at that state and all packages are rebuilt.

That's the basic gist of it. I'll have to keep experimenting with it to see how it works in practice.

Sep 10 '23 18:09 progfolio

The git history of each repository would be preserved as well.

In my case, that's 3 gigs of data, half a gig if going with shallow checkouts. As with some of the other posters, I'm a bit skeptical...

The entire store as is is committed to the package store repository in a way that avoids submodules.

Oh, care to share your secret sauce? I'm just curious.

There would also be a minimal API around it so you don't really need to know how to use git to use it.

Ah, I think we've got the source of the dissonance in this issue here. You're working on an user-friendly, self-contained solution that can be used by anyone. But those asking for a lock file already has their config in git and are looking for a way to control elpaca from that.

It's two different user-stories, but as you say, they ought to be able to co-exist. It's "just" a matter of someone implementing elpaca-load-lockfile.

Sep 10 '23 19:09 xendk

Thomas Fini Hansen @.***> writes:

In my case, that's 3 gigs of data, half a gig if going with shallow checkouts. As with some of the other posters, I'm a bit skeptical...

I thought you showed 565M total in your store earlier? In any case, there may be other tricks to optimize the storage size.

Oh, care to share your secret sauce? I'm just curious.

If I test more and think it will be a viable solution, I'll push it to a feature branch which can be tested. I'll mention it here if that happens.

Ah, I think we've got the source of the dissonance in this issue here. You're working on an user-friendly, self-contained solution that can be used by anyone.

Yes. I believe that should be offered alongside other solutions.

But those asking for a lock file already has their config in git and are looking for a way to control elpaca from that.

It's two different user-stories, but as you say, they ought to be able to co-exist. It's "just" a matter of someone implementing elpaca-load-lockfile.

Some other changes would need to be made, too. For example, you'd need a way to say "rebuild these packages from scratch". There's a naive approach here:

https://github.com/progfolio/elpaca/compare/master...feat/rebuild-from-scratch

but I don't think that will be the final approach. Basically we need a way to get the repo into the declared state prior to rebuilding anything, without losing any possible changes to the repo. It sounds easy until you start implementing it.

Backups are the highest priority feature at the moment, so I'll begin working on them again soon.

Sep 10 '23 19:09 progfolio

I thought you showed 565M total in your store earlier?

Well, that's shallow repos most of it. Looking about, it seems that elpaca will do shallow clones and then fetch new history when updating? I'll admit I've never worked much with shallow clones.

For example, you'd need a way to say "rebuild these packages from scratch".

Why does it need to re-clone? Nuking the build dir seems like a sensible cleanup, but why re-clone if the ref we're updating/downgrading to is in the repo? If you're trying to revert to a working configuration, the needed ref should already be available (unless shallow copies get in the way, of course). I would think that bringing repos to the same version as the lockfile and rebuilding packages that were changed should suffice (well, plus cloning stuff that hasn't yet, to support the "rebuild from scratch" scenario).

without losing any possible changes to the repo. It sounds easy until you start implementing it.

Well yeah... Would it help if the prerequisite for elpaca-write-lockfile was no uncommitted changes? It does open up new ways to shoot oneself in the foot (if one nukes the local repo and had a local branch with changes for instance), but it could work.

Sep 10 '23 21:09 xendk

Thomas Fini Hansen @.***> writes:

Well, that's shallow repos most of it. Looking about, it seems that elpaca will do shallow clones and then fetch new history when updating? I'll admit I've never worked much with shallow clones.

Yes. A shallow clone has a "grafted" root node and will pull in new history.

Why does it need to re-clone?

It needn't in all cases. As I mentioned, that's a naive implementation. A complete solution will not be as simple.

to a working configuration, the needed ref should already be available (unless shallow copies get in the way, of course). I would think that bringing repos to the same version as the lockfile and rebuilding packages that were changed should suffice (well, plus cloning stuff that hasn't yet, to support the "rebuild from scratch" scenario).

It sounds easier than it is. There are many corner cases. You have to consider that the package recipe itself may have been altered between backups. The repo may not contain the history to retrieve a given ref. etc.

Would it help if the prerequisite for elpaca-write-lockfile was no uncommitted changes? It does open up new ways to shoot oneself in the foot (if one nukes the local repo and had a local branch with changes for instance), but it could work.

There'd have to be a policy similar to that in place for a lockfile solution. Otherwise, it would be too easy to lose un-pushed work.

Sep 11 '23 00:09 progfolio

Lock files work so long as the upstream source still has the commit referenced in the lock file available. However, if the upstream disappears or overwrites history, the ref is useless.

How about creating a nix profile depending on the git sources for that set? As long as you hold onto the resulting profile as a GC root, all the git sources will remain in the store. Since the Guix store is basically the same implementation, both of these systems can be used similarly for holding onto snapshots of all the packages efficiently.

https://nixos.org/manual/nix/stable/package-management/profiles

To rehydrate, you would just copy the immutable git sources into /repos and rebuild everything and maybe do updates.

Jan 02 '24 15:01 psionic-k

@psionic-k You could probably achieve the same thing without nix by creating git branches in the same repository, one for each upstream repo, pointing at the commit you're using. I suspect this is the approach @progfolio is considering as the "full backup" method? You could check out the individual branches as worktrees to share the git repository and objects between them.

The downside is that git doesn't expect to be used in this way, so it'll be a bit harder to interact with the upstream repos and push patches, etc. But I guess it would work for backups, since you could use the recipe metadata to reconstruct things like upstream URLs and branch names that would normally be in the config of a checked out git repo.

Jan 02 '24 20:01 roshanshariff

I will give my 2¢ that this problem is the entire purpose of https://archive.softwareheritage.org/. There's nothing wrong with giving people the option to create full git backups as described, but frankly I think falling back to software heritage on clone failure is simple and robust. Guix, a project which takes source reproducibility very seriously, takes this approach.

Jul 03 '24 04:07 dominicm00

Thanks for chiming in.

I will give my 2¢ that this problem is the entire purpose of https://archive.softwareheritage.org/.

Cool project. However, after kicking the tires, it looks like it's missing quite a few of my github repositories.

There's nothing wrong with giving people the option to create full git backups as described, but frankly I think falling back to software heritage on clone failure is simple and robust. Guix, a project which takes source reproducibility very seriously, takes this approach.

I have an idea for how to implement simple lockfiles which will be at least on par with what straight.el offers (with a better UI). The main hurdle now is time. Money is tight for me right now (unfortunately, I don't pay my bills by writing software) so I've had to pick up two jobs and am working long hours most days. When I get some time I will implement the idea I have.

Jul 03 '24 11:07 progfolio

I will give my 2¢ that this problem is the entire purpose of https://archive.softwareheritage.org/.

Cool project. However, after kicking the tires, it looks like it's missing quite a few of my github repositories.

I'm surprised! Usually anything on GitHub is on there. Maybe I'll look into creating a (M)ELPA lister so that published emacs packages are indexed more regularly. It's also possible I can make a submission tool within emacs...will take a look.

I have an idea for how to implement simple lockfiles which will be at least on par with what straight.el offers (with a better UI). The main hurdle now is time. Money is tight for me right now (unfortunately, I don't pay my bills by writing software) so I've had to pick up two jobs and am working long hours most days. When I get some time I will implement the idea I have.

Of course; you've already created more than enough incredible software for free! Thank you so much for what you've done already! IMO elpaca is basically as close to perfect as we have in a package manager ❤️

Jul 03 '24 13:07 dominicm00

elpaca elpaca copied to clipboard

[Feature]: package version lock files

Feature Description

Confirmation

elpaca
elpaca copied to clipboard