Blob creation ignores line-ending policies
If I read the code correctly, dulwich.index.blob_from_path_and_stat() pays no attention to the core.autocrlf setting and pays no attention to anything in the .gitattributes file.
This is unexpected, and on Windows it is a particular problem because code staged and committed using Dulwich will have line endings that are inconsistent with commits by other tools that do follow Git line-ending policies.
On Wed, Oct 03, 2018 at 02:58:26AM -0700, BoarGules wrote:
If I read the code correctly,
dulwich.index.blob_from_path_and_stat()pays no attention to thecore.autocrlfsetting and pays no attention to anything in the.gitattributesfile.This is unexpected, and on Windows it is a particular problem because code staged and committed using Dulwich will have line endings that are inconsistent with commits by other tools that do follow Git line-ending policies. Unfortunately there are a number of places where Dulwich doesn't support C Git options like this.
A pull request to add support for autocrlf would be great.
-- Jelmer Vernooij [email protected] PGP Key: https://www.jelmer.uk/D729A457.asc
I'm working on a patch. Implementing the line-ending policy isn't hard. But first I have to read the configuration to discover what the policy is. I am running into NotImplementedErrors in dulwich.config and filling those gaps is taking longer than I would like.
On Fri, Oct 05, 2018 at 03:41:28AM -0700, BoarGules wrote:
I'm working on a patch. Implementing the line-ending policy isn't hard. But first I have to read the configuration to discover what the policy is. I am running into
NotImplementedErrors indulwich.configand filling those gaps is taking longer than I would like. What method exactly is raising NotImplementedError?
You probably want to add a autocrlf argumen to Blob.from_path_and_stat() and then set that argument based on the config in Repo.stage().
You should be able to get a config object from a repository - by calling:
self.get_config_stack().get_boolean(("core", ), "autocrlf")
According to my reference, the place to start is core.eol, which is almost never defined, and so raises an unavoidable KeyError, because the .get() method lacks a default parameter. It's hard to discover what is there because the discovery methods itersections(), iteritems(), has_section() all raise NotImplementedErrors. The module config needs attention in any case because it doesn't know to look in common Windows config file locations. That is what is taking longer than I would like.
On Mon, Oct 08, 2018 at 12:21:31AM -0700, BoarGules wrote:
According to my reference, the place to start is
core.eol, which is almost never defined, and so raises an unavoidableKeyError, because the.get()method lacks adefaultparameter. It's hard to discover what is there because the discovery methodsitersections(),iteritems(),has_section()all raiseNotImplementedErrors. The moduleconfigneeds attention in any case because it doesn't know to look in common Windows config file locations. That is what is taking longer than I would like. There shouldn't be a need to iterate over sections or items here - just calling .get() and catching the KeyError is what I'd expect to happen here.
Jelmer
-- Jelmer Vernooij [email protected] PGP Key: https://www.jelmer.uk/D729A457.asc
I have a lot of lookups to do. I don't want to litter the code with try...except blocks.
I've started to work on this issue in https://github.com/dulwich/dulwich/pull/674
As @BoarGules already started from the configuration point of view, I started from the line-ending conversion point of view.
Here is an example of how to clone a repository with pre-defined autocrlf settings using the low-level "plumbing". I looked at the examples provided in #674 for inspiration - though all those examples were for initializing and committing to a bare repository. I was not able to find a way of using the porcelain (without monkey patching) as it handles the repository initialization for you with assumptions.
from dulwich import client, index
from dulwich.repo import Repo
repo_path = <some_repo>
target = "./local"
c, path = client.get_transport_and_path(repo_path)
repo = Repo.init(target)
config = repo.get_config()
config.set("core", "autocrlf", False") # or True, or auto etc.
config.write_to_path()
# At this point, you'll get a bare repository with a `.git/config` autocrlf entry
remote_refs = client.fetch(path, repo, determine_wants=repo.ojbect_store.determine_wants_All)
repo[b"HEAD"] = remote_refs.refs[b"HEAD"]
tree = repo[b"HEAD"].tree
index_file = repo.index_path()
index.build_index_from_tree(repo.path, index_file, repo.object_store, tree)