[Request for comments]: Site Versioning
What is the purpose of this pull request? MVP for #1009
- [ ] Documentation update
- [ ] Bug fix
- [X] Feature addition or enhancement
- [ ] Code maintenance
- [ ] Others, please explain:
Overview of changes:
- Add a command to archive a version under a given name. Archived versions are stored in HTML format, in a folder with the version name, which is inside the version folder(default name for the version folder is "version"). All links in the versioned site will link to other pages in the versioned site.
To summarize the current solution, what it does is build the website and place it into a folder within the repository for it to be versioned (default location is version/, but you can customise this). When the site is built, by default the build action copies the versioned site over into _site, and it is deployed. If the user decides to store versions in different folders, then the URL to the version will simply be different.
TODO:
- [X] Amend the way that links have their baseUrl changed such that it works even if baseUrl != ''
- [X] Need to test for performance. (only a small difference: on MarkBind its 1/4th of a second, from ~ 4.75s to 5s.
- [x] Multiple versions -- saving a subsequent version should not keep the version folders
- [x] Clean up code
- [x] Add documentation
- [x] Add tests (?) How to test.
Anything you'd like to highlight / discuss:
Benefits of the solution proposed in this PR
- Simple & customisable
- Since .html and asset files are kept, past changes in markbind version, assets used, etc will be no issue --> each version contains everything it needs to deploy as a site
- (Relatively) performant. I don't have proof of this yet, but I believe copying the files of html folders/asset folders should be a relatively small cost, as compared to generating from markbind files.
Potential Drawbacks
- The user does need to remember which folder they store their versions in, since they can change the name of the folder, etc.
- Additional manual(?) cost in implementing frontend components and creating backups of their versions in branches/repositories if they happen to use git (could be mitigated by future PRs targetting this)
Thoughts
- Is the site.json file copied over in the versioned files still being used? ETA: with further investigation, site.json is not copied over, a siteData.json is being created instead.
CAVEATS:
- If the baseURL is changed, the baseURL of the versioned site does not change.
Testing instructions:
markbind archive <versionName> [versionFolderName]--> builds and saves the site in a given version and folder name.markbind archive <versionName>builds and saves the site with a given version name in a folder called version
Access the site by manually appending <versionFolderName>/<version name> after the baseURL in the deployed site.
Proposed commit message: (wrap lines at 72 characters)
Implement a basic versioning CLI command
Site versioning is key for documentation use, and education websites may want to keep past versions for archival purposes as well.
Let's implement a markbind-cli command, markbind archive, to allow users to easily version their website. When markbind build/serve is run, all links within the versioned site will point to their versioned equivalent.
Checklist: ☑️
- [x] Updated the documentation for feature additions and enhancements
- [x] Added tests for bug fixes or features
- [x] Linked all related issues
- [ ] No unrelated changes
Stretch goals/future improvements
-
[ ] Additional CLI commands to support renaming, moving versions, and deleting versions, so that users do not ever need to modify
versions.jsonmanually. -
[ ] Support live preview updating when the versions property of site.json is changed.
-
[ ] Editing of previous versions. My preferred solution is saving versions in separate branches, holding all files, except previously archived versions. Update the deployed files by navigating to a separate branch (and take advantage of git; like say if you fix a bug which affects all versions, you can cherrypick the commit over to the versioned branch). This will also support reverting to previous versions (working and deploying off of the versioned branch...?). caveat: this is dependent on git.
- Further support updating gh-pages from a versioned branch to update past versions. I think this should be possible, as the ghpages extension supports an "add" which would not remove existing files. We could remove the existing "version" directory and import the files from the branch back into main/the current branch, from which you can deploy as normal.
- Believe this solution can account for MarkBind versions changing over time, as the master only holds the HTML files of each versioned site. Might require building and deploying to be done using NPM to ensure appropriate version control
-
[ ] Front end component to easily navigate between versioned branches? Personally leaning towards a bannet, like in the 3281 site(note: banner component does not exist), but a component in the footer with a default message that you can change (like the "generated by markbind" message) might be sufficiently versatile to work as a default implementation. We could also have an auto-generated dropdown component which users can place in their navbar as desired. Key difficulty: making it versatile and customisable for all types of sites while also working "out of the box", since websites don't need to follow any set format, this may be difficult.
Nice work @kaixin-hc 👍
Just some initial thoughts.
Is the site.json file copied over in the versioned files still being used?
Can the user edit the archived site? From what I can see now, it only stores the built site (the .html files) so I am assuming once a site is archived, the user can't edit the site anymore? If that is the case, then I don't think the site.json needs to be copied over.
If the baseURL is changed, the baseURL of the versioned site does not change.
Will this cause any issues? If the versioned sites are contained within the main site, then I suspect this would cause some issues with the links within the versioned sites.
Archived versions are stored in HTML format, in a folder with the version name, which is inside the version folder(default name for the version folder is "version").
What happens when the user decides to store versions in different folders? We will need to keep track of where each versioned site is located. Maybe within the main site.json file? This will also help with the implementation of the frontend component to navigate between the versioned sites.
Thanks for the review @jonahtanjz ! (also, theres been a lot of discussion in the issue #1009, but I'm not sure if I should reply mainly here or there) Some replies:
Will this cause any issues? If the versioned sites are contained within the main site, then I suspect this would cause some issues with the links within the versioned sites.
It does cause issues in terms of deployment - if the baseUrl of the current website changes (for example, it being moved to a repository with a different name and deployed with github pages), the links in the previous website do not change accordingly. I can't see any clean workaround besides navigating back to a past commit of that version or the person keeping another copy of that version, and having the person rebuild the files. Otherwise you'd need to parse the HTML intelligently to figure out which URLs to change the baseUrl of(intra-links), and which to keep static(external links), which is not trivial. However, the changing of the baseUrl is also not a frequently expected use-case.
We will need to keep track of where each versioned site is located.
I think we can track which folders are "version folders" in site.json - it will be necessary in order to ignore certain existing files when creating a new version. This can be done in two ways as @ang-zeyu suggested, a dedicated versions.json file like in docusaurus (simpler) or tracking the source and baseUrls for each version, if we need to change it in the future (developing it to allow versioned sites to point to other repos, etc -- cross origined versioning).
What happens when the user decides to store versions in different folders?
To summarize the current solution, what it does is build the website and place it into a folder within the repository for it to be versioned (default location is version/<versionName>, but you can customise this). When the site is built, by default the build action copies the versioned site over into _site, and it is deployed. If the user decides to store versions in different folders, then the URL to the version will simply be different.
From my discussion with @damithc today, the following features are features to discuss in future commits
- Front end UI component pointing to the version --> he prefers to leave it off for now, and leave it up to users to customize
- Using git branches to store previous versions (my proposal in the PR description to "Saving versions in separate branches"). To avoid a reliance on git (as users might not be using git to version control or deploy), this is not desirable as a built in feature and may be considered as an addon
- Because of this, in response to your question "Can we edit the versioned site", the answer is not conveniently/through MarkBind. It is of course possible to edit the past versions manually (navigate to desired commit, re-archive the site under the desired version name, moving the created folder to replace the past copy of the past version files). But I also realised that site.json isn't kept, exactly; it's transformed into siteData.json, which is used to generate the titles and index the pages for search, for example. So I think all those files need to be created.
@ang-zeyu @ryoarmanda, while I still need to write tests, the basic implementation is done - would appreciate a review if you have time so that I can check I'm on the right track!
Wonder if we should have a root level attribute "archivePath" to keep track of the output instead of duplicating it in individual "output" (assuming that all versions will share the same root path)
@tlylt I think since we may want to extend this to allow having versions in different repos or branches, it may be better to have them as separate entries.
Wonder if we should have a root level attribute "archivePath" to keep track of the output instead of duplicating it in individual "output" (assuming that all versions will share the same root path)
@tlylt I think since we may want to extend this to allow having versions in different repos or branches, it may be better to have them as separate entries.
In that case would a separate attribute "achivePath" within the individual object be cleaner? (instead of composing the output, list out the ingredients of the output path)
So something like:
{
"versions": [
{
"version_name": "v1.2.2",
"build_ver": "3.1.1",
"archivePath": "version",
},
{
"version_name": "v1.2.3",
"build_ver": "3.1.1",
"archivePath": "version",
}
]
}
In that case would a separate attribute "achivePath" within the individual object be cleaner?
Okay, I can do this! I was considering it originally, but thought output might be better as I think most of the time the use would use the composed output rather than the separate elements. Having the additional versatility is probably a good thing though
@kaixin-hc minor:
- Is
versions(instead ofversion) a better name for the folder? - Is the
versions.jsonfeature is already supported in this PR? If so, I assumeversions.jsonis generated automatically? I didn't see any mention of it in the documentation pages.
- Is
versions(instead ofversion) a better name for the folder?
I decided on version because this is used to auto-generate the URL, and I think having a url that is version/v1.1.1/ is nicer than versions/v1.1.1/
- Is the
versions.jsonfeature is already supported in this PR? If so, I assumeversions.jsonis generated automatically? I didn't see any mention of it in the documentation pages.
Yep, it is automatically generated and updated when new versions are created! You can find the reference to it in the documentation, but I'll be updating it with an example and perhaps a bit of information about the potential issues in modifying versions.json.
Updates + Done
- Have handled adding the subsite(s)' version(s) to siteConfig.ignore
- Resolved all @tlylt 's comments
- Saves the baseUrl at the point of archiving the site
Current todo list:
- [x] Independent copying mechanism for archived files
- [x] Correctly exclude copying over of versions with the same baseUrl
- [ ] Tests
- [x] Update versioning documentation with new changes + what happens if you manually edit
versions.json
Notes/preliminary thoughts:
- For build and serve, one way of implementing only 'building' or copying certain versions might be passing the cli command an option
--versions. If passed this flag, it will copy over/build all versions (Might help with the issue of copying over large site), or maybe you can specify which options to build, otherwise it might just only serve the current site.- That means I need to exclude the copying over of versioned files during build and serve as well. This is just a preliminary idea though ... I think I need to check it against the implementations of build and serve to see if it will work.
- Might want to implement the version-related functions in a VersionManager class. I think having a PageManager class or similar might be the way to go to refactor the
markbind/packages/core/src/Site/index.jsfile, since it already has a layout, plugin and sitelinkmanager. - Do we want to support only building/deploying selected versions? This makes sense to me, and implies to me that we might want to be able to "name" versions to refer to them later, necessitating that these names be unique: e.g. "current", "long term support". (Current "versionName" does not have to be unique, just a unique combination of archiveName and versionName)
- We could implement this by changing the structure of versions.json so that the versions have a key which is their unique name
- (Easier, potentially more practical?) We could introduce another parameter into site.json, specifying the version file to use (default versions.json), and deploy only the versions in that version file. But we still require versions.json to exist and to include all versions, otherwise the exclusion of version files won't work properly.
- For build and serve, one way of implementing only 'building' or copying certain versions might be passing the cli command an option
--versions. If passed this flag, it will copy over/build all versions (Might help with the issue of copying over large site), or maybe you can specify which options to build, otherwise it might just only serve the current site.
- That means I need to exclude the copying over of versioned files during build and serve as well. This is just a preliminary idea though ... I think I need to check it against the implementations of build and serve to see if it will work.
This sounds interesting, but also sounds like more work 👀 (in view of the timeline). You are welcome to experiment with this direction of course, if you have the time.
But otherwise, I would be ok with copying all versions for this PR. You can use a simple flag (e.g. isFirstGeneration) to tackle the issue with copying multiple times in markbind serve.
Or, iirc (do double check), you might not even have to use a flag, as the function build/serve calls on first site build is the same one. Subsequent builds in markbind serve should be using separate functions, so you could take a look at implementing your copying at the end of that function.
- Do we want to support only building/deploying selected versions? This makes sense to me, and implies to me that we might want to be able to "name" versions to refer to them later, necessitating that these names be unique: e.g. "current", "long term support". (Current "versionName" does not have to be unique, just a unique combination of archiveName and versionName)
Possibly, in view of the discussion on cross-origin support.
In this PR, we assume all sites are singly deployed for now (and we only support this), so I don't think we'll have to be worried too much.
But good to think ahead nonetheless:
- We could implement this by changing the structure of versions.json so that the versions have a key which is their unique name
Agreed version_name could be unique.
It shouldn't be too much of a limitation for the author as we expect sites to be only periodically versioned, and should simplify future implementation greatly. (e.g. if we have 2 identical version names but one is stored in a local folder, one in a repo, we'd have to construct a composite key -- but this shouldn't be too difficult either.)
The downside is that the author loses some flexibility (no /archivePath1/v1 can exist with /archivePath2/v1).
How about, instead of ⬇️
All archived versions are stored in the folder
<archivePath>/<versionName>
We let archivePath be the entire <archivePath>/<versionName> directly?
So the default value of archivePath would be version/${versionName} instead.
This way we get the benefits of both (url flexibility + simple, unique version name to specify in --versions or for our internal verification)
- (Easier, potentially more practical?) We could introduce another parameter into site.json, specifying the version file to use (default versions.json), and deploy only the versions in that version file. But we still require versions.json to exist and to include all versions, otherwise the exclusion of version files won't work properly.
I'm favouring something like the --versions flag idea above to deploy/build specific versions, to keep the clunky lower level details of versions.json away from the user. Would also prevent any issues from incorrectly manually maintaining 2 versions.json versions.
Tests
We could use a functional test with some "special" procedures for this (much like the special procedures for testing markbind convert vs other sites)
We let
archivePathbe the entire<archivePath>/<versionName>directly? So the default value ofarchivePathwould beversion/${versionName}instead. This way we get the benefits of both (url flexibility + simple, unique version name to specify in--versionsor for our internal verification)
I do like this idea! So just to double check, archivePath becomes the URL, and both archivePath and versionName must be unique in versions.json. Hence for /archivePath1/v1 and /archivePath2/v1, they could be specified with
markbind archive v1_take1 "archivePath1/v1" # version named v1_take1 at url {baseUrl}/archivePath1/v1
markbind archive v1_take2 "archivePath2/v1"# version named v1_take2 at url {baseUrl}/archivePath2/v1
I do like this idea! So just to double check, archivePath becomes the URL, and both archivePath and versionName must be unique in
versions.json. Hence for/archivePath1/v1and/archivePath2/v1, they could be specified withmarkbind archive v1_take1 "archivePath1/v1" # version named v1_take1 at url {baseUrl}/archivePath1/v1 markbind archive v1_take2 "archivePath2/v1"# version named v1_take2 at url {baseUrl}/archivePath2/v1
yup
archivePath and versionName must be unique in versions.json
~(about archivePath particularly) for now at least, since we only support a single deployment / baseurl. (should be no need to verify this in your implementation)~ (filepath conflicts)
@ang-zeyu @MarkBind/active-devs I still need to work on tests, however I think this is an MVP of everything discussed!
- See documentation: cli Commands, site versioning, site.json file
- build and serve now take an additional
--versionsor-vflag to specify version names. If you specify the versions, only those versions will be deployed. If you just use the flag, all versions are deployed. If you don't use the flag, it defaults to theversionsproperty insite.json(which is set to an empty array by default when the SiteConfig is generated if it does not exist).- Have added the versions property to site.json because it may become tiring to repetitively specify the site to deploy - but it is optional.
- This reflects the independent file copying of version files - which just copies the entire folder over into
_site. I think this implementation should be fairly efficient as files are only copied once.
- When archiving, the archived site is stored as described
- Sites are not copied over if baseUrls do not match
- build and serve now take an additional
- Archive command
- Archives the current site and not past versions or subsite versions, saving the baseUrl, markbind build, name, and path the site is stored in in
versions.json. The current site is built, and its HTML files and assets are stored at archivePath. versions.jsonis automatically updated and shouldn't need to be edited by the user. However, since delete archive / change names or location of archive commands are not implemented yet, for now the user can do so and then manually edit versions.json to ensure correct behaviour.
- Archives the current site and not past versions or subsite versions, saving the baseUrl, markbind build, name, and path the site is stored in in
Note on subsites(taken from warning I wrote here):
At present, when a site is archived and includes a subsite, it archives the subsite as it was at that point in time. Navigating to previous or future versions of the subsite from the parent site is not supported, though you can archive the subsite.
I decided not to look into supporting past versions of subsites within the parent subsite at present because
- I think the use-cases are quite niche
- At present, my only idea for implementing it is "injecting" the past versions of the subsite when every site is versioned - this will increase the time taken to archive and to build for limited utility, especially if there are many subsite versions or many parent site versions.
I believe that the archived sites are not rebuilt, recopied, or overwritten now when other files are changed (because the "generate" command and the other commands to rebuild site files seem to be separate). If this isn't true, please let me know! Or if there is anything that can be changed for usability!
(after I work on tests I will probably leave this PR be until after finals week)
Hmm I'm actually thinking we copy all versions over by default (since if the user archives a site, they probably want to include it in always given this PR assumes single deployment). The use case for -v might be to retroactively "hide" a version of the site. (though something like a future markbind archive delete command might also work). There's also this quote "Might help with the issue of copying over large site", which assumes all archived sites are copied over by default.
Could / should we modify this slightly to do so? from @ang-zeyu 's comment
I've taken a bit to think about this, and I think I prefer the current implementation of 1) Specify desired versions in site.json 2) Override as specifically you want with the versions flag. Reasons being:
- Site.json holds most of the "global" site information. But some of these/defaults can be overridden already (like baseUrl, or specifying which siteconfig to use). Following this structure for versions just makes sense to me.
- If all archived sites are copied over by default, in most cases you will need to do nothing. If you want a specific version and to exclude others, you can just specify the versions that you want and leave out the version you don't want? (I think this number of versions you need to type in will be relatively small, because how many versions will you check at once? Probably either all versions, or a specific version that you're testing).
- I think the main question here is "are there usecases where you might want to override the data in site.json"? and I think we have already thought of some, although they aren't the most common or typical
- multiple domains
- large sites
- Any case where a site is archived but not intended to be deployed. I don't think this will be that common for documentation, but I could see people doing so for their own records, teachers doing so for versions of courses they are no longer teaching but think they might want to reference one day, read-only copies of notes
- Prof raised that we shouldn't assume that our audience is necessarily using git - so I think in rare cases where one isn't using a version control tool, having a snapshot of the website might be useful. A bit like wayback machine...?
- I don't think keeping this feature makes the overall implementation that much more complex.
- The default use would just be to specify the desired version in site.json, and not think about it after that
- So it can just be an additional feature that is there if you need it.
- I'm not convinced implementing the reverse leads to that many convenience benefits
- I don't think there are that many uses where you'd deliberately want to hide versions when deploying
- Also this is a bit nitpicky but I'm not sure what to call the flag? Specifying -v definitely implies to me that that's a version I want to include, rather than a version i want to exclude.
In summary: I think there's no harm, potentially a benefit for a niche group of users, and I'm not sure -v to hide single versions is more convenient.
I've taken a bit to think about this, and I think I prefer the current implementation of 1) Specify desired versions in site.json 2) Override as specifically you want with the versions flag. Reasons being:
I think the main question here is "are there usecases where you might want to override the data in site.json"? and I think we have already thought of some, although they aren't the most common or typical
I'm not convinced implementing the reverse leads to that many convenience benefits
Thanks for the explanation, I think we are already on the same page. I'm also ok with site.json followed by --versions override.
Meant altering the defaults for the more common use case like how you're doing it for --versions currently:
// site.json
"versions": null // by default (or if the property dosen't exist) -- deploy all versions.
From what I see in SiteConfig.js it currently seems to be [] by default. (no versions copied)
Any case where a site is archived but not intended to be deployed. I don't think this will be that common for documentation, but I could see people doing so for their own records, teachers doing so for versions of courses they are no longer teaching but think they might want to reference one day, read-only copies of notes
- I don't think keeping this feature makes the overall implementation that much more complex.
- The default use would just be to specify the desired version in site.json, and not think about it after that
ok 👍, convinced of the use case.
Though, to add some perspective on the second argument for generality, even "small" features like this add a commitment, which can introduce other non-implementation complexity issues down the line (deprecation, breaking changes, integration with some other feature, contradiction with other yet-to-be-added features). Which makes supporting rare use cases not too worthwhile in the first place. From a user standpoint, adding features does increase product complexity as well, in some cases that make things less appealing. On the contrary, an appealing feature should also not be forgone in view of implementation complexity, if possible with the resources.
The best example here might be bootstrap themes, which while initially simple in implementation, has many contradictions with https://github.com/MarkBind/markbind/issues/903, and a whole bunch of unforseen implementation issues that are too tedious to maintain / resolve.
Really sorry for the delay! @jonahtanjz @ang-zeyu pinging as you were previously involved in these conversations.
I have implemented tests for the new functionality (just unit tests) and I think this PR may be sufficient to stand alone. Other changes since the last review
- Documentation changes as requested
- Version flag for building and serving now leads to no sites being deployed.
- Fixed bugs involving file paths/non posix style code for windows
I also updated the list of future-PRs for this feature for reference, these can be made into separate issues for other contributors to pick up. Thanks!
@MarkBind/active-devs @damithc Hi all! Sorry for no progress on this for so long. Can I check if this feature is still relevant to figure out if this should be revised or just closed?
I think it is still relevant, @kaixin-hc
Okay! Just to update that I've revisited the code and these comments to some extent and I think the original principles were sound, but the refactoring done since when I first worked on this and now is making the merge quite painful, which is why there's been some delay; but I'll make the fixes either here or in a fresh PR.