filetree icon indicating copy to clipboard operation
filetree copied to clipboard

Monticello metadata solved?

Open dalehenrich opened this issue 8 years ago • 11 comments

As described in this post, it seems that a reasonable compromise for dealing with Monticello metadata, is that the *.package/monticello.meta/version file be a copy of the Monticello version history when a package is copied into a FileTree repo. Newly created packages (not via copy) would contain an empty version file. The Monticello metadata would not be updated on every commit .... When a package is copied then it becomes necessary to fabricate a revision history to bridge the gap between the original Monticello version history and the present commit - this fabricated version history may account for every git commit since the version file was last changed, or simply a new commit referencing the current SHA ...

this solution could involve adding some new MCAncestry classes that would record the url and SHAof git commits ...making it possible to merge two Monticello packages with a common ancestor somewhere in the git history ... or we could treat the git world as a series of lost mcz files .... If we do support merges involving a specific SHA in github, then new MC classes and support would be required which has impacts on older versions of the images trying to do the merge or even read the packages ... adding fields in the commit comment using JSON or STON containing the fields of interest would probably solve the backward compat problem while allowing newer tools to support such a merge ...

Finally we need to resolve the handling of the methodProperties files .. those files are there only for the Monticello meta data and we could drop these files altogether if we fabricated method definitions that picked this information from the latest commit that involved the file ... for performance reasons, I would be inclined to not include this information unless a package was being prepared for export via copy to repository ...

I think we should have a bit of free form discussion (i.e., what code can be leveraged from @ThierryGoubier and GitFileTree? and what does @ottobehrens do in his current FileTree implementation?)

One of the issues I'm curious about is how to get past the Monticello/gofer/Metacello? feature that skips loading a version of a Monticello package with a lesser version number ... I know that I have code that addresses this for Metacello (in a separate class for Cypress packages), but we might need to jigger the loader or whatever to make this work but I think that both of you (@ThierryGoubier and @ottobehrens) have dealt with this (and I've forgotten the details)?

I'm not in a hurry to implement this, but I think that it should bump up in priority because it will eliminate the need for the GitMergeTool and make life for the folks working purely in git much simpler, while making it possible for folks choosing not to use git to work with and contribute to projects that have been moved to git/github ...

dalehenrich avatar Feb 03 '16 00:02 dalehenrich

As described in this post, it seems that a reasonable compromise for dealing with Monticello metadata, is that the *.package/monticello.meta/version file be a copy of the Monticello version history when a package is copied into a FileTree repo. Newly created packages (not via copy) would contain an empty version file. The Monticello metadata would not be updated on every commit .... When a package is copied then it becomes necessary to fabricate a revision history to bridge the gap between the original Monticello version history and the present commit - this fabricated version history may account for every git commit since the version file was last changed, or simply a new commit referencing the current SHA ...

this solution could involve adding some new MCAncestry classes that would record the url and SHAof git commits ...making it possible to merge two Monticello packages with a common ancestor somewhere in the git history ... or we could treat the git world as a series of lost mcz files .... If we do support merges involving a specific SHA in github, then new MC classes and support would be required which has impacts on older versions of the images trying to do the merge or even read the packages ... adding fields in the commit comment using JSON or STON containing the fields of interest would probably solve the backward compat problem while allowing newer tools to support such a merge ...

Finally we need to resolve the handling of the methodProperties files .. those files are there only for the Monticello meta data and we could drop these files altogether if we fabricated method definitions that picked this information from the latest commit that involved the file ... for performance reasons, I would be inclined to not include this information unless a package was being prepared for export via copy to repository ...

I think we should have a bit of free form discussion (i.e., what code can be leveraged from @ThierryGoubier and GitFileTree? and what does @ottobehrens do in his current FileTree implementation?)

We currently .gitignore the version files and generate them just before we load into Pharo / GS. We use the SHA1 of the package directory to generate the UUID with and arbitary commit message, date and time. Ancestors and stepChildren are empty. We derive the version number from the SHA1, (between 1 and 10000) which ensures that the version number changes.

One of the issues I'm curious about is how to get past the Monticello/gofer/Metacello? feature that skips loading a version of a Monticello package with a lesser version number ... I know that I have code that addresses this for Metacello (in a separate class for Cypress packages), but we might need to jigger the loader or whatever to make this work but I think that both of you (@ThierryGoubier and @ottobehrens) have dealt with this (and I've forgotten the details)?

This feature is a surprise! I did not know that a lesser version number was skipped! So we reload all the packages after an initial load. To make really sure that it is all loaded. We also found that unresolved dependencies are not properly resolved. Yes, our package structure sucks and we have circular dependencies.

What would be nice is that if I give Gofer a list of packages to load, that it would load them, without trying to think if the version number is more recent or not.

I'm not in a hurry to implement this, but I think that it should bump up in priority because it will eliminate the need for the GitMergeTool and make life for the folks working purely in git much simpler, while making it possible for folks choosing not to use git to work with and contribute to projects that have been moved to git/github ...

I agree, this is a basic problem to resolve. We use the git command line to merge.

We had a better solution for generating the version files. We counted the number of commits for a package directory in the git repository, which ensured that the version number increased. We also took the comment from the most recent commit of that package, with the date and time. The problem was that this is slow. Searching through the git repo for ancestors of each of our packages took a good 20-30 seconds, which we needed every time we loaded something.

ottobehrens avatar Feb 03 '16 04:02 ottobehrens

Hi @ottobehrens , @dalehenrich

this solution could involve adding some new MCAncestry classes that would record the url and SHAof git commits ...making it possible to merge two Monticello packages with a common ancestor somewhere in the git history ... or we could treat the git world as a series of lost mcz files .... If we do support merges involving a specific SHA in github, then new MC classes and support would be required which has impacts on older versions of the images trying to do the merge or even read the packages ... adding fields in the commit comment using JSON or STON containing the fields of interest would probably solve the backward compat problem while allowing newer tools to support such a merge ...

I'd say it is doable to, when recreating a mc version from git, to make a pre-existing version an ancestor of the git-recreated mc version history. The current GitFileTree code would need very little to do that, since it recreates a true, perfect MC version history from the git data.

What is a bit harder is to make the difference between a newly mc-imported package (one which will erase the current git history record) for which one need to update the metadata version file, and a normal package save for which we should keep intact the version record. And I wonder how many times guys doing this will later say their action was a mistake...

Finally we need to resolve the handling of the methodProperties files .. those files are there only for the Monticello meta data and we could drop these files altogether if we fabricated method definitions that picked this information from the latest commit that involved the file ... for performance reasons, I would be inclined to not include this information unless a package was being prepared for export via copy to repository ...

This is the case in GitFileTree already (no use and no need of those files... need to check if I did it right, still). Again, I think it can be kept as a static record (import of a mc package with pre-existing history) and never overwritten by normal package saves on the git repo.

An analogy would be to save a first time with filetree, write the properties and version metadata, commit that, and never touch or overwrite that again.

I think we should have a bit of free form discussion (i.e., what code can be leveraged from @ThierryGoubier and GitFileTree? and what does @ottobehrens do in his current FileTree implementation?) We currently .gitignore the version files and generate them just before we load into Pharo / GS. We use the SHA1 of the package directory to generate the UUID with and arbitary commit message, date and time. Ancestors and stepChildren are empty. We derive the version number from the SHA1, (between 1 and 10000) which ensures that the version number changes.

Then GitFileTree has the code you need. GitFileTree recreates the complete history of the package out of the git log in a reasonable timeframe (two git queries), generates UUIDs from the SHA of each commit, trace properly all ancestors, renumber and name all versions present in the git based on the git ordering and author.

Metadata-less GitFileTree just never writes the metadata at all (no need for .gitignore), and for metadata mode, the GitFileTree merge driver takes care of git command line merging.

One of the issues I'm curious about is how to get past the Monticello/gofer/Metacello? feature that skips loading a version of a Monticello package with a lesser version number ... I know that I have code that addresses this for Metacello (in a separate class for Cypress packages), but we might need to jigger the loader or whatever to make this work but I think that both of you (@ThierryGoubier and @ottobehrens) have dealt with this (and I've forgotten the details)?

Not yet, but David Allouche on the pharo mailing list pointed out a solution: resolving "newer than" by a partial ordering on the "is an ancestor of" property.

This feature is a surprise! I did not know that a lesser version number was skipped! So we reload all the packages after an initial load. To make really sure that it is all loaded. We also found that unresolved dependencies are not properly resolved. Yes, our package structure sucks and we have circular dependencies.

What would be nice is that if I give Gofer a list of packages to load, that it would load them, without trying to think if the version number is more recent or not.

I'm not in a hurry to implement this, but I think that it should bump up in priority because it will eliminate the need for the GitMergeTool and make life for the folks working purely in git much simpler, while making it possible for folks choosing not to use git to work with and contribute to projects that have been moved to git/github ...

Yes.

I agree, this is a basic problem to resolve. We use the git command line to merge.

We had a better solution for generating the version files. We counted the number of commits for a package directory in the git repository, which ensured that the version number increased. We also took the comment from the most recent commit of that package, with the date and time. The problem was that this is slow. Searching through the git repo for ancestors of each of our packages took a good 20-30 seconds, which we needed every time we loaded something.

Ok, then you want to have a look at GitFileTree, which does that (also writes properly a complete ancestry chain in MC).

Thierry

ThierryGoubier avatar Feb 03 '16 08:02 ThierryGoubier

Thanks Thierry,

Then GitFileTree has the code you need. GitFileTree recreates the complete history of the package out of the git log in a reasonable timeframe (two git queries), generates UUIDs from the SHA of each commit, trace properly all ancestors, renumber and name all versions present in the git based on the git ordering and author.

Ok, then you want to have a look at GitFileTree, which does that (also writes properly a complete ancestry chain in MC).

We're running Pharo 3.0. Will GitFileTree work properly on Pharo 3.0 or do we have to upgrade?

ottobehrens avatar Feb 03 '16 09:02 ottobehrens

We're running Pharo 3.0. Will GitFileTree work properly on Pharo 3.0 or do we have to upgrade?

You'll have a working version of GitFileTree (I've used it for internal development). I may have not backported some of the most recent changes (the metadata-less mode) which is still being a bit a work in progress (just to see if others would see strange things happening to it).

What is missing from the Pharo 3 version is:

  • Windows support
  • github://-like url syntax for gitfiletree:// urls
  • metadata-less mode
  • some refactoring

It is available in the configuration browser of Pharo 3.

At the moment, development happens on the pharo4.0_dev and pharo5.0_dev branches of filetree; I use in production the pharo4.0_dev version. I can backport things to the pharo3.0_dev branch if you need the new features, but I think testing those in a newer version of Pharo (4 or 5; beware, 5 can be almost unusable at times) is a good way to go.

Thierry

ThierryGoubier avatar Feb 03 '16 10:02 ThierryGoubier

Hi Thierry,

I installed Pharo 4 today and got our code loaded. Still some pain to get things going here. But I took a stab at using gitfiletree.

I tried to open our repository with a monticello browser. It would not create the version history and especially the ancestry from git log at all. Our git repository has 37000 commits. We have 94 packages. Our package structure is in desperate need of a revamp. We have a package that contains a big proportion of our code. That single package has 9832 versions! (Commits in the git repo that affected that tree) So as you can imagine, building the ancestry for each version from git does not work at all.

So I optimised MCFileTreeGitRepository and GitFileTreePackageEntry quite a bit and could actually refresh the repository in a few minutes. I optimised the way the parents of a package directory is retrieved from git log. I also then made the MCVersionInfo (a new subclass) retrieve the ancestors lazily.

I would like you to have a look at the changes. I attach the package. Please let me know if there's something you'd like me to fix or if I should push it to the smalltalkhub repo (if I have permission) or what you'd like me to do.

Cheers Otto

On Wed, Feb 3, 2016 at 12:34 PM, Thierry Goubier [email protected] wrote:

We're running Pharo 3.0. Will GitFileTree work properly on Pharo 3.0 or do we have to upgrade?

You'll have a working version of GitFileTree (I've used it for internal development). I may have not backported some of the most recent changes (the metadata-less mode) which is still being a bit a work in progress (just to see if others would see strange things happening to it).

What is missing from the Pharo 3 version is:

  • Windows support
  • github://-like url syntax for gitfiletree:// urls
  • metadata-less mode
  • some refactoring

It is available in the configuration browser of Pharo 3.

At the moment, development happens on the pharo4.0_dev and pharo5.0_dev branches of filetree; I use in production the pharo4.0_dev version. I can backport things to the pharo3.0_dev branch if you need the new features, but I think testing those in a newer version of Pharo (4 or 5; beware, 5 can be almost unusable at times) is a good way to go.

Thierry

— Reply to this email directly or view it on GitHub https://github.com/dalehenrich/filetree/issues/177#issuecomment-179156925 .

ottobehrens avatar Feb 19 '16 20:02 ottobehrens

Oops, I created way too much ancestry; simplified a bit.

On Fri, Feb 19, 2016 at 10:47 PM, Otto Behrens [email protected] wrote:

Hi Thierry,

I installed Pharo 4 today and got our code loaded. Still some pain to get things going here. But I took a stab at using gitfiletree.

I tried to open our repository with a monticello browser. It would not create the version history and especially the ancestry from git log at all. Our git repository has 37000 commits. We have 94 packages. Our package structure is in desperate need of a revamp. We have a package that contains a big proportion of our code. That single package has 9832 versions! (Commits in the git repo that affected that tree) So as you can imagine, building the ancestry for each version from git does not work at all.

So I optimised MCFileTreeGitRepository and GitFileTreePackageEntry quite a bit and could actually refresh the repository in a few minutes. I optimised the way the parents of a package directory is retrieved from git log. I also then made the MCVersionInfo (a new subclass) retrieve the ancestors lazily.

I would like you to have a look at the changes. I attach the package. Please let me know if there's something you'd like me to fix or if I should push it to the smalltalkhub repo (if I have permission) or what you'd like me to do.

Cheers Otto

On Wed, Feb 3, 2016 at 12:34 PM, Thierry Goubier <[email protected]

wrote:

We're running Pharo 3.0. Will GitFileTree work properly on Pharo 3.0 or do we have to upgrade?

You'll have a working version of GitFileTree (I've used it for internal development). I may have not backported some of the most recent changes (the metadata-less mode) which is still being a bit a work in progress (just to see if others would see strange things happening to it).

What is missing from the Pharo 3 version is:

  • Windows support
  • github://-like url syntax for gitfiletree:// urls
  • metadata-less mode
  • some refactoring

It is available in the configuration browser of Pharo 3.

At the moment, development happens on the pharo4.0_dev and pharo5.0_dev branches of filetree; I use in production the pharo4.0_dev version. I can backport things to the pharo3.0_dev branch if you need the new features, but I think testing those in a newer version of Pharo (4 or 5; beware, 5 can be almost unusable at times) is a good way to go.

Thierry

— Reply to this email directly or view it on GitHub https://github.com/dalehenrich/filetree/issues/177#issuecomment-179156925 .

ottobehrens avatar Feb 19 '16 21:02 ottobehrens

Hi @ottobehrens ,

this sounds really great! I'm really interested by what you've done; but I think your attachment was eaten by github on the reply.

I can give you access to whatever GitFileTree smalltalkhub repository you'll like (the Pharo4Dev one?) or, if you prefer, as a pull request here.

Thierry

ThierryGoubier avatar Feb 19 '16 21:02 ThierryGoubier

Yes, the Pharo4Dev one is fine.

On Fri, Feb 19, 2016 at 11:26 PM, Thierry Goubier [email protected] wrote:

Hi @ottobehrens https://github.com/ottobehrens ,

this sounds really great! I'm really interested by what you've done; but I think your attachment was eaten by github on the reply.

I can give you access to whatever GitFileTree smalltalkhub repository you'll like (the Pharo4Dev one?) or, if you prefer, as a pull request here.

Thierry

— Reply to this email directly or view it on GitHub https://github.com/dalehenrich/filetree/issues/177#issuecomment-186413444 .

ottobehrens avatar Feb 19 '16 21:02 ottobehrens

I just registered. OttoBehrens

On Fri, Feb 19, 2016 at 11:35 PM, Otto Behrens [email protected] wrote:

Yes, the Pharo4Dev one is fine.

On Fri, Feb 19, 2016 at 11:26 PM, Thierry Goubier < [email protected]> wrote:

Hi @ottobehrens https://github.com/ottobehrens ,

this sounds really great! I'm really interested by what you've done; but I think your attachment was eaten by github on the reply.

I can give you access to whatever GitFileTree smalltalkhub repository you'll like (the Pharo4Dev one?) or, if you prefer, as a pull request here.

Thierry

— Reply to this email directly or view it on GitHub https://github.com/dalehenrich/filetree/issues/177#issuecomment-186413444 .

ottobehrens avatar Feb 19 '16 21:02 ottobehrens

I worked from stable. So I see lots have changed in the development branch. Looking at the merge now

On Fri, Feb 19, 2016 at 11:39 PM, Otto Behrens [email protected] wrote:

I just registered. OttoBehrens

On Fri, Feb 19, 2016 at 11:35 PM, Otto Behrens [email protected] wrote:

Yes, the Pharo4Dev one is fine.

On Fri, Feb 19, 2016 at 11:26 PM, Thierry Goubier < [email protected]> wrote:

Hi @ottobehrens https://github.com/ottobehrens ,

this sounds really great! I'm really interested by what you've done; but I think your attachment was eaten by github on the reply.

I can give you access to whatever GitFileTree smalltalkhub repository you'll like (the Pharo4Dev one?) or, if you prefer, as a pull request here.

Thierry

— Reply to this email directly or view it on GitHub https://github.com/dalehenrich/filetree/issues/177#issuecomment-186413444 .

ottobehrens avatar Feb 19 '16 21:02 ottobehrens

Done!

You are a contributor on Pharo4Dev now.

I'll be away with limited access for around a week, so it will take a little while to answer.

ThierryGoubier avatar Feb 19 '16 21:02 ThierryGoubier