kirby icon indicating copy to clipboard operation
kirby copied to clipboard

Duplicating pages does regenerate file UUIDs, but does not update content file with new UUIDs

Open FynnZW opened this issue 1 year ago • 17 comments

Description

Related to #4831. Duplicating pages with files now regenerates file UUIDs, but the files linked in the content.txt still point towards the original page (the duplication source).

Expected behavior
Duplicated files should be linked correctly in the content.txt.

To reproduce

  1. Create a page with images, with UUIDs active
  2. Duplicate that page, with files
  3. File UUIDs get regenerated, but linked are still the parent files, not the newly generated ones.

Your setup

Kirby Version
3.8.2

Additional context
See also dicussion here #4844

FynnZW avatar Nov 26 '22 23:11 FynnZW

The same applies to subpages. Their UUIDs are currently not regenerated (#4864), but when they are, their references should also be replaced like with duplicated files.

Thoughts on a possible implementation: https://github.com/getkirby/kirby/issues/4844#issuecomment-1327892762

lukasbestle avatar Nov 27 '22 12:11 lukasbestle

I don't think we can fix this tbh. There are too many edge cases where such UUIDs could be stored (picker fields, kirbytags...) and going through the raw content files with a regex seems more like a hack than a stable solution. Prior to UUIDs we exactly didn't do such things to update IDs when changing slug etc. as it would've been too unreliable.

distantnative avatar Dec 03 '22 15:12 distantnative

I have an idea. Kirby can read every pages own files and the file metadata. So it would know that a pages local file example.jpg has the UUID of say abcdefg123456789 (which is stored in example.jpg.txt).

Then Kirby could do a very specific string replace for exactly that file ID (instead of a vague regex). Search: file://abcdefg123456789 Replace: file://newuuid123456789

This can then be done for every file separately. And this way, only folder specific file links get replaced.

That should be a pretty solid operation, right? I don’t know much about Kirby works internally, but this seems like it could work.

FynnZW avatar Dec 04 '22 01:12 FynnZW

I agree it could work that way. It's still a pretty massive change as we would first need to load a list of the old UUIDs into memory, then regenerate all of them and keep a mapping between old and new UUID, then update the UUIDs in the content files and finally replace references to the old UUIDs with the new UUIDs. For large page trees with many files this could become very memory-intensive. It's still doable, but I think we'll have to optimize the logic so that it only needs to read and write to each file once.

lukasbestle avatar Dec 04 '22 10:12 lukasbestle

Since v4 is on the horizon now, is there a chance this will still be fixed for Kirby v3?

In my opinion this is a pretty serious bug, risking content loss, if files UUIDs are not disabled and pages are being duplicated. Example: -> a page incl. images gets duplicated (without files) -> image is still visible in duplicated page, gets deleted by panel user -> image is instead deleted from original page, gone everywhere

Might be a good idea to mention this in the docs, so people are aware of this behaviour, and can disable file UUIDs.

FynnZW avatar Mar 20 '23 13:03 FynnZW

I agree that the current behavior is a bug and a current limitation of the UUID system.

Implementation-wise, the situation has not changed much since November/December. It is a hard problem to solve in a way that is both robust and high-performing even with large page trees. I feel like the solution you suggested might work, but as explained it is also not trivial.

We are currently in the process of refactoring the content implementation of the Kirby core. Maybe we'll have an easier time implementing the UUID update behavior after that. But for now we cannot say when we'll be able to fix this bug.

Regarding adding this to the docs: I just thought about it, but I wonder what the suggestion would be. Disabling file UUIDs will just work if the field references files from the current page, not from subpages. So even before the UUID feature we already had the same bug, just not as pronounced. So we can say that there is a general bug that references from content files are not updated on duplication.

lukasbestle avatar Mar 23 '23 21:03 lukasbestle

So even before the UUID feature we already had the same bug, just not as pronounced. So we can say that there is a general bug that references from content files are not updated on duplication.

It’s true that before UUIDs the file reference was also wrongfully kept, but with UUIDs there’s now a major difference in the panel:

  • the pictures of the original page are visible (UUIDs allow the panel to find them now)
  • those pictures can now be accidentally deleted (no one would expect they are deleted from original page)

So now content loss is a risk when it was not before. I’m also not sure how to document it, but it seems that’s something people should be aware of.

FynnZW avatar Mar 23 '23 21:03 FynnZW

Ok I'm opening up to the idea again. I can see that some might choose to store UUIDs even for files of the same page. And then duplicating the page can lead to unexpected results.

@lukasbestle do you really think we need to wait for the content changes refactoring? Because this would need to happen during the duplication and for the new page, we know that there won't be any unsaved changes yet. So it could be safe to already do that server-side.

distantnative avatar Mar 24 '23 10:03 distantnative

@distantnative Not so much content changes, more the translation refactoring. If we get that done, I think we could get this fix done easier as well because there will be less code branches to take care of.

lukasbestle avatar Mar 24 '23 11:03 lukasbestle

I can see that some might choose to store UUIDs even for files of the same page. And then duplicating the page can lead to unexpected results.

I would just like to add why I think this is not an edge case:

  • UUIDs are now default. Disabling them would be a choice, using them is not
  • Files relative to the page are the most common setup (e.g. this is also how the Kirby starterkit is setup).

So I would say most Kirby installations would be affected by the bug, if they duplicate pages via the panel. I’m not sure how common duplicating pages is, but our users/clients do it all the time.

And this bug introduces irreversible wrong file connections, which can be invisible and hard to spot at first. I understand you have a lot of other considerations, I just wanted to spell out again why I think this would be important to fix soon.

FynnZW avatar Mar 25 '23 13:03 FynnZW

Trying to operationalise this a bit more:

  • any of the newly generated UUIDs for page, subpages and files from https://github.com/getkirby/kirby/blob/main/src/Cms/PageActions.php#L34 potentially has an outdated counterpart still referencing the page/file from the origin (not the new duplicated ones)
  • if we collect these as old UUID => new UUID map
  • we can not only go through the page content file, but all newly created content files (e.g. also content files of duplicated files), and use the UUID map to replace UUIDs pointing to old stuff that was now duplicated
  • and then in the end all UUIDs should point to the new stuff

distantnative avatar Mar 25 '23 14:03 distantnative

Yes, exactly. 👍

lukasbestle avatar Mar 25 '23 16:03 lukasbestle

Not tested and not functional yet, but a first dry-write approach: https://github.com/getkirby/kirby/commit/6e4bd4506d6be3657bb680f12d1b75e507ff5d81

distantnative avatar Mar 26 '23 17:03 distantnative

@distantnative Awesome, I think that already goes in the right direction. Three things I noticed:

  • We also need to adapt the children of the copy.
  • I would turn around the nesting order of the foreach loops at the bottom of adaptCopy or even throw out the outer loop completely. Right now, you are reading and writing each content file for each UUID. By looping over all content files instead, we can read each file just once, replace everything with str_replace(array_keys($uuids), array_values($uuids), $content) and write the file once.
  • The adaptCopy() method takes a $uuids argument that is immediately overwritten with an empty array. I guess that was just a leftover.

lukasbestle avatar Mar 30 '23 20:03 lukasbestle

Any update on this? I know you're probably all focused on v4 now, but this fix would also be great for v4.

FynnZW avatar Aug 07 '23 08:08 FynnZW

Thank you all for looking into this and coming up with a promising solution.

I just stumbled upon this bug when duplicating a page with lots of blocks and images. I first didn't activate "copy files" and was surprised that the images showed up anyway in the duplicated page's blocks. I'm very used to the Kirby behavior that files belong to the page. Because of that @FynnZW's arguments resonate with me - both in this issue and the sibling issue (#4844).

mrflix avatar Feb 20 '24 10:02 mrflix

I fully agree with what @FynnZW is saying.

As things stand currently, we're stuck between a rock and hard place:

• duplicate in the panel => uuids not updated in content, risk of content loss and unintentional changes to other pages -- requires knowledge of the bug and additonal work to "remap" the references to images and files

• duplicate folder manually => manually remove uuids from files, and replace uuid references with file names and let kirby regenerate uuids again afterwards - client needs to hire a hacker

(Related to this: when copying blocks/layouts that contains images and then pasting on a different page, the files are just referenced to the page where they were copied from. Here it would be great to also have an option to "copy files" to actually bring the files over too.

I really appreciate that UUIDS were introduced, but if it affects the daily tasks for editors working in the panel it kind of starts to defeat it's purpose of bringing reliability. It can quickly become a mess on a site with many pages if you don't know where the files in use originate from. Can UUIDs even be disabled and removed retroactively for existing content? Clients need to be able to duplicate pages confidently and worry free. This kind of manual labour involved and where things easily can be screwed up, is quite hard to explain to a client that also have a handful of different editors in the panel daily.

I didn't mean for this to sound like a rant. I appreciate that you are working tirelessly with Kirby, just a bit surprised that this issue have not gotten more attention.

timkinali avatar Apr 18 '24 20:04 timkinali

Recently a customer duplicated a page with files and then deleted some images. She then was wondering why an image disappeared from and broke the original page. I had to recover the file and explain the bug to her 😔

mrflix avatar Jul 22 '24 09:07 mrflix

The issue @mrflix mentioned is exactly the reason why i’m setting duplicate: false for every page blueprint. It’s not clear that the files of the duplicated page are stored in the page content folder of the original page. And can lead to the unexpected side effects mentioned.

ovenum avatar Jul 22 '24 13:07 ovenum

I'm not a fan of pushing issues without further information but you have to understand that this issue has a big impact on many projects.

Disabling UUIDs is not a viable solution. It's a significant step backwards that introduces even more problems than we had before their implementation.

The current situation puts many of us in a difficult position: we must choose between allowing page duplication (risking client frustration) or disabling a core feature that many clients find essential.

medienbaecker avatar Jul 23 '24 08:07 medienbaecker

We know you all are pushing for it and trust us, your posts aren't going by unnoticed.

We have looked into it various times, another time again and I made some progress: https://github.com/getkirby/kirby/pull/6567

However, this is a lot less trivial as your comments make it seem like. Your scenarios are rather straight-forward with e.g. a single page and some files. Our implementation though has to cover all scenarios including files, children and deeper nesting where it becomes less obvious what the ideal solution would be (are we cross-replacing UUIDs from different nested branches of pages?) and even harder to implement in a performant and reliable way.

We're on it and yet I can only ask for your patience, fully aware that this has remained unsolved for quite a while.

distantnative avatar Jul 23 '24 18:07 distantnative