git-lfs icon indicating copy to clipboard operation
git-lfs copied to clipboard

gitattributes(5) should control paths' desired Git LFS extensions

Open bb010g opened this issue 3 years ago • 8 comments

Currently, all extensions declared/registered in git-config(1) under lfs.extension.* are invoked when the LFS filter driver cleans.

https://github.com/git-lfs/git-lfs/blob/c6fc8813fd73bdfb392e4ae74cf061e48688b1f2/lfs/gitfilter_clean.go#L19-L30

There is no way to specify per-file which extensions should be invoked, despite extensions being named. Additionally, the mere presence of extension declarations causes their invocation, which may be confusing, and you cannot opt-out of this behavior. (Inability to at least opt-out makes global or system LFS extension declarations impractical, even if they're shared between repositories.)


I propose that extensions should be opt-in via gitattributes(5) attributes. They could be handled as follows:

lfs-extension
Set

Use the value of the lfs.extensions configuration variable to decide what LFS extensions to Invoke on a path.

Unset

Do not invoke any LFS extensions on a path.

Unspecified

Use the value of the lfs.extensions configuration variable to decide what LFS extensions to invoke on a path, except when the variable is not set.

For backwards compatibility, when the lfs.extensions variable is not set, all LFS extensions known to LFS are invoked on a path.

String

Specify a comma separated list of LFS extensions to invoke on this path in the same format as the lfs.extensions configuration variable.

I also propose that default LFS extension behavior should be controlled by configuration variables. These could operate as follows:

lfs.extensions

Specify the default LFS extensions to invoke on paths. Defaults to the empty string (""), except for the case of an unspecified lfs-extensions attribute (for backwards compatibility).

This setting should be set to a comma-separated list of LFS extension names. Order is not significant.

Finally, I propose that new Git LFS installations should default lfs.extensions to the empty string system-wide, whenever possible.


These proposals allow LFS extensions to continue to operate without breakage today, but also set up for much more robust & transparent configuration of LFS extensions going forward. Paths in the same repository can utilize different extension sets, users can declare reused extensions globally without ruining other repositories (especially useful due to lfs.extension.* keys' unsafety in .lfsconfig (#4044)), and repositories for which always using all local LFS extensions on paths works can continue to do so more reliably via lfs.extensions.


I do have some open questions, though.

  • Is lfs.extensions lfsconfig-safe?

    On the bright side, repositories could only specify string names of LFS extensions that the user would already need installed, but these extension names have no uniqueness guarantees, and a malicious repository that knows the (user-decided) name of a user's extension that leaks sensitive data when smudging could pose enough of a security risk. Would it be enough to document against this concern and encourage only globally declare extensions with both names unlikely to collide & semantics that don't leak data until configured with the repository?

  • Do we want to warn about legacy lfs.extensions behavior?

    Using all known extensions by default feels unsecure to me, even if users normally don't register extensions globally or system-wide.

    Also, this would push existing repositories to explicitly specify their LFS extensions (in .lfsconfig or post-clone documentation), meaning new Git LFS installations (with git config --system lfs.extensions "") will just work.

  • Should extensions documentation continue to use "register" for the act of placing extensions in git-config(1)?

    I feel like this act is much more "declare" now, with "register" being probably best for placing extension names in lfs.extensions or lfs-extensions to be invoked in the future. Extensions would be declared (system-wide, globally, or locally) to allow registration, then registered (normally locally) to allow invoking, then invoked (with a path) when Git invokes the LFS filter driver.


Note that when the LFS filter driver smudges, a blob's extension set from its previous clean are read from its LFS pointer, avoiding this problem.

https://github.com/git-lfs/git-lfs/blob/c6fc8813fd73bdfb392e4ae74cf061e48688b1f2/lfs/gitfilter_smudge.go#L122 https://github.com/git-lfs/git-lfs/blob/c6fc8813fd73bdfb392e4ae74cf061e48688b1f2/lfs/gitfilter_smudge.go#L135-L147

bb010g avatar Oct 20 '20 02:10 bb010g

Hey,

Thanks for the issue, and sorry it's taken me so long to get back to you. I'm definitely open to seeing a difference in how extensions operate; the feature is considered experimental and I don't get the impression it gets a lot of use.

I think your proposal for using gitattributes is fine, except that I'm not sure we can rely on a variable being unset. If a user has lfs.extensions set globally, it's not possible to unset it with Git. We could add an additional key in addition to clean, smudge, and priority called enabled or default-enabled and default it to true if unset, which would provide backwards compatible behavior.

I think if the extension names in .gitattributes are safe, then they're also safe in .lfsconfig, because both are stored in the repository. Git already does that for diff and merge drivers, and that's not a security problem.

I don't think I want to warn about old behavior because it means every existing repository gets a warning, which is irksome. I also think that “register” is fine; I'm not particular about the terminology, and I don't think “register” versus “declare” provides a meaningful semantic difference.

bk2204 avatar Oct 21 '20 21:10 bk2204

I also don't like the unset variable subtlety in this proposal, but I don't know a cleaner way to deal with backwards compat while not messing up a potential future breaking change (when people have switched off) to make unspecified lfs.extensions the same as specified empty string. My worry about breaking an (admittedly experimental & uncommon) feature like this is that places that depend on Git LFS triggering for encryption or other security purposes could upgrade their LFS installs, not know that their existing extensions aren't being respected anymore, and then publicly leaking data. On that note, an upgrade through a *nix package manager wouldn't probably have a great time managing only setting system-wide lfs.extensions to the empty string for new installs, so it'd be safer to just warn on lfs.extensions not being set for a few releases until we're confident that everyone has set it and lfs.extensions can finally truly default to the empty string, no system-wide config change weirdness necessary.

The idea is that anyone who sets lfs.extensions to the empty string for their install will also know to upgrade all existing LFS-extension–using repositories to more explicit semantics.

Ignoring backwards-compat, (initial) unspecified lfs.extensions emulates the unsafe (IMO) behavior of invoking all known extensions all the time; repositories can emulate the safe behavior by setting lfs.extensions in .lfsconfig with all the extensions they use. This could be clearer in the docs?

Glad to know that lfs.extensions would be safe in .lfsconfig; I think that makes this transition a lot easier, and allows LFS to do things like warn when you're missing extensions on new blobs, potentially failing the operation until you set up the extension. (Config variable for that?)

"Register" semantics you describe are fine by me. (Declare vs. register is a nit, and bikeshedding it is ultimately unimportant.)

We wouldn't be warning on every repository, but rather only those invoking extensions? An extension-avoiding install never touches any of these semantics, or their changes.

I thinik lfs-extensions is also a better attribute name than lfs-extension, since it's controlling a set (like the lfs.extensions configuration variable is). Going to switch to that now.


I don't quite understand what you mean by attributes lfs-enabled/lfs-default-enabled? Does its unspecification for a path enable all known extensions (from everywhere), as current?

If so, I think a much clearer name would be lfs-extensions-default-all-known, which is a painfully long name, but that's the point. With that, we could have attributes:

lfs-extensions
Set

Use the value of the lfs.extensions configuration variable to decide what LFS extensions to Invoke on a path.

Unset

Do not invoke any LFS extensions on a path.

Unspecified

Use the value of the lfs.extensions configuration variable to decide what LFS extensions to invoke on a path, except when a path's lfs-extensions-default-all-known attribute is unspecified or set.

lfs-extensions-default-all-known is an attribute that should be unset for all paths whenever possible, but for backwards compatibility that may not be the case. See that attribute's documentation for more details.

String

Specify a comma separated list of LFS extensions to invoke on this path in the same format as the lfs.extensions configuration variable.

lfs-extensions-default-all-known

Deprecated. The lfs-extensions-default-all-known attribute allows subtle & backwards-compatible behavior with older versions of LFS, and should be unset for all paths whenever possible.

Previously, all known LFS extensions were invoked by default on paths. Not only would LFS invoke only extensions a repository locally desires, but all known LFS extensions (including those registered system-wide or globally) would be invoked. This attribute exists to support transitioning to safer, opt-in behavior, e.g. portably setting the lfs.extensions configuration variable in .lfsconfig.

If lfs-extensions-default-all-known is set, Git prints a warning when running the LFS filter driver.

Set

Deprecated. When a path leaves its lfs-extensions attribute unspecified, ignore the configuration variable lfs.extensions entirely and invoke all LFS extensions known on that path, including extensions registered system-wide or globally. This attribute should never be set.

Unset

Rely on normal lfs.extensions behavior for this path. When the lfs.extensions configuration variable is unspecified, it will default to the empty string and no LFS extensions will be invoked by default, even when registered (system-wide, globally, or locally), making extension use opt-in.

Unspecified

Deprecated. When a path leaves its lfs-extensions attribute unspecified, ignore the configuration variable lfs.extensions entirely and invoke all LFS extensions known on that path, including extensions registered system-wide or globally. This attribute should be unset as soon as possible, after migrating the repository over to safer, opt-in LFS extension behavior.

I'd like to not support setting lfs-extensions-default-all-known if we can get away with it. Could we alternatively completely error if a path has explicitly set that attribute? (It's inconsistent with other Git attributes, but this seems sane, and prevents our old, unsafe behavior from hanging around longer.)

And our configuration variables become simpler:

lfs.extensions

Specify the default LFS extensions to invoke on paths. Defaults to the empty string ("").

This setting should be set to a comma-separated list of LFS extension names. Order is not significant.

bb010g avatar Oct 22 '20 20:10 bb010g

My proposal for enabled or default-enabled was in the Git configuration:

[lfs "extension.foo"]
  clean = foo clean %f
  smudge = foo smudge %f
  priority = 0
  default-enabled = false

I don't think we need an additional .gitattributes attribute for this, especially one that we're immediately deprecating. My proposal lets people control whether their extension are enabled or not when lfs-extensions and lfs.extension are unspecified; the default is to be backwards compatible (and thus enabled), but folks can then disable the extensions by default once they no longer need them and have them off.

My concern with adding a warning is that some people never see warnings because they use tools in a non-interactive way, so it's not effective. It's also annoying once you've seen the warning once or twice. A lot of Git LFS users work on things like games or on Windows, where they may interact entirely through a UI and never see the command line tool and no matter how much we scream, they'll never notice. We therefore need to provide something backwards compatible at least initially.

While I am sympathetic to the fact that there are some unpleasant implications for encrypted blobs, I'm not in general convinced that encrypting all data in a Git repository is a good idea, and degrading gracefully is the way that Git works, so absent a compelling reason for us to differ from Git's behavior, I'd like to do the same thing Git does.

I'd also like to ask you, once we have a final design, to open a PR to add this to the docs/proposals directory. Do note that, like most major design proposals, this will require someone to implement it; while we are definitely open to seeing improvements here, the core team is small and we probably won't have sufficient time to get to a major redesign.

bk2204 avatar Oct 22 '20 21:10 bk2204

Ah, a per-extension configuration variable makes sense and is a lot less obtrusive for repositories.

However, I don't like how new extensions could sneak back on by default if registered through something like a system package (which hopefully wouldn't set default-enabled = false to maintain backwards-compat for their users). Is there value in allowing specific extensions to use legacy behavior, where a user would set default-enabled = false for a few extensions but leave it on for others, instead of modifying their relevant .git/configs for repositories?

If a new LFS extension is packaged, then it could reasonably set default-enabled = false for itself, forcing users to debug why one extension isn't working the same as the rest of their extensions.

I think this ultimately has the same issues as a global configuration variable lfs.extensions-default-enabled, i.e.

[lfs]
	extensions-default-enabled = false

which would be a lot simpler to check, and would only likely be system-assigned by Git LFS packagers (meaning extension behavior breaking by default would be much more unlikely.)

A lot of Git LFS users work on things like games or on Windows, where they may interact entirely through a UI and never see the command line tool and no matter how much we scream, they'll never notice.

Okay, that makes sense. I was mainly basing off of the current unsafe .lfsconfig key warning behavior.

We therefore need to provide something backwards compatible at least initially.

The options I've described are backwards-compatible, but don't require a new configuration variable under every registered extension. The lfs.extensions-default-enabled configuration variable would likely be globally scoped, and the lfs-extensions-default-all-known would effectively be locally-scoped (* lfs-extensions-default-all-known in .gitattributes, checked in).

While I am sympathetic to the fact that there are some unpleasant implications for encrypted blobs, I'm not in general convinced that encrypting all data in a Git repository is a good idea, and degrading gracefully is the way that Git works, so absent a compelling reason for us to differ from Git's behavior, I'd like to do the same thing Git does.

I'm not sure what you mean here?

I'd also like to ask you, once we have a final design, to open a PR to add this to the docs/proposals directory. Do note that, like most major design proposals, this will require someone to implement it; while we are definitely open to seeing improvements here, the core team is small and we probably won't have sufficient time to get to a major redesign.

Sure thing. I'd be willing to try my hand at a patch for this too.

bb010g avatar Oct 23 '20 02:10 bb010g

I think this ultimately has the same issues as a global configuration variable lfs.extensions-default-enabled, i.e.

[lfs]
	extensions-default-enabled = false

which would be a lot simpler to check, and would only likely be system-assigned by Git LFS packagers (meaning extension behavior breaking by default would be much more unlikely.)

Sure, I can go for that. Maybe lfs.extensions.default-enabled?

We therefore need to provide something backwards compatible at least initially.

The options I've described are backwards-compatible, but don't require a new configuration variable under every registered extension. The lfs.extensions-default-enabled configuration variable would likely be globally scoped, and the lfs-extensions-default-all-known would effectively be locally-scoped (* lfs-extensions-default-all-known in .gitattributes, checked in).

I'd rather have lfs.extensions.default-enabled be the knob and not have a .gitatributes entry for enable or disabling legacy behavior. It's a lot easier to make changes that involve backwards incompatible behavior (like changing the default of whether extensions are enabled) when it's only in the config. When we put incompatible changes into the repository by storing them in .gitattributes, we're guaranteeing to essentially support them forever.

While I am sympathetic to the fact that there are some unpleasant implications for encrypted blobs, I'm not in general convinced that encrypting all data in a Git repository is a good idea, and degrading gracefully is the way that Git works, so absent a compelling reason for us to differ from Git's behavior, I'd like to do the same thing Git does.

I'm not sure what you mean here?

I was referring to your comment about using encrypted blobs as extensions. I agree we should have a default behavior that's backwards compatible at first. We also need to consider what happens if an extension requested in lfs.extensions doesn't exist; in that case, we need to ignore it as if it were not there, since that's what Git does for things like filters.

Sure thing. I'd be willing to try my hand at a patch for this too.

Great. We're happy to help you with any questions or code issues you might have.

bk2204 avatar Oct 26 '20 21:10 bk2204

Hey,

We're considering doing a 3.0 release relatively soon, and if you were interested in including this, that would be great. That would also be a great time to change extensions to default to off, since it would be a breaking change. Then we wouldn't need lfs.extensions.default-enabled here.

Let us know if that's a thing you're still interested in.

bk2204 avatar Feb 03 '21 22:02 bk2204

I'm interested in this as well.

mhwombat avatar Jul 03 '21 19:07 mhwombat

We're happy to accept a patch for this, but v3.0 is upcoming soon, so if someone wanted to make an incompatible change, that would be the time.

bk2204 avatar Jul 05 '21 20:07 bk2204

+1

theoryshaw avatar Nov 15 '22 13:11 theoryshaw

I started a bounty for this issue, if anyone would like to contribute to it: https://app.bountysource.com/issues/93535439-gitattributes-5-should-control-paths-desired-git-lfs-extensions

theoryshaw avatar Nov 16 '22 17:11 theoryshaw