wordpress-playground icon indicating copy to clipboard operation
wordpress-playground copied to clipboard

[Data Liberation] Tracking issue

Open adamziel opened this issue 4 months ago • 1 comments

Let's use this issue to track Data Liberation: Let's Build WordPress-first Data Migration Tools

Technical plumbing

  • [x] https://github.com/WordPress/wordpress-playground/pull/1893
  • [x] https://github.com/WordPress/wordpress-playground/pull/1952
  • [x] https://github.com/WordPress/wordpress-playground/pull/1967
  • [x] https://github.com/WordPress/blueprints-library/pull/116
  • [ ] https://github.com/WordPress/wordpress-playground/pull/1968
  • [ ] Extension points for plugin-provided URL treatment, e.g. base64_decode specific block attributes before rewriting the URLs
  • [ ] Identifying each post's dependency graph to frontload the dependent data first
  • [ ] Frontloading media files (fetching them before inserting the wp_post where they're used)
  • [ ] Dependency management – should we ship all the PHP classes in this repo? Or publish independent plugins for others to start adapting in their work – but with no BC guarantees?
  • [ ] Streaming WXR import
  • [ ] Streaming SQL import and export
  • [ ] Streaming ZIP import and export
  • [ ] Per-row version control (like @dmsnell's vector clock idea from https://core.trac.wordpress.org/ticket/60375)
  • [ ] A conflict resolution mechanism with filters for plugin authors. Perhaps we won't need one, though.
  • [ ] ... More TBD ...

Preliminary roadmap by use-case

  • [ ] WXR preprocessor
    1. Port XML streaming logic from https://github.com/adamziel/wxr-normalize/
    2. Evaluate URL detection via https://github.com/WordPress/wordpress-develop/pull/7450
    3. Preprocess all WXR files before importing them to Playground to... * Rewrite the content URLs
      • Pre-fetch media files
    4. Run this before importing WXR files into Playground to start collecting feedback
  • [ ] Static block markup editor
    1. Build a simple plugin to import and export .html files representing specific WordPress pages from GitHub.
    2. Ship a Blueprint that loads Playground Docs into Playground
    3. We need to have a real use-case for interacting with data liberation on a daily basis and this is one. It's a super low-friction way of maintaining the Playground documentation and WordPress-on-GitHub-pages in general. (cc @bph @akirk)
  • [ ] Reliable Playground ZIP export / import
    1. Fork the Sandbox Site plugin
    2. Improve the SQL export to make it streamable and ensure there are absolutely no issues with escaping
    3. Rewrite the exported and imported site URLs
    4. Include extension points to enable custom treatment of any block attribute, database row etc. See one of the GitHub discussions referenced in #1888
    5. Consider shipping .sql files with the export to potentially enable importing the resulting .zip in a regular MySQL-based server environment
    6. ...anything else actually?
  • [ ] "Duplicate Playground" feature
    1. Iteration 1: Pipe the ZIP export to ZIP import
    2. Iteration 2: Mount /wordpress-new in the duplicated Playground instance, run the PHP export/import code to migrate the site from /wordpress there
    3. Iteration 3: Keep track of progress, make it resumable regardless of when the process is interrupted. This would enable exporting really big sites
  • [ ] Direct WordPress <-> WordPress transfer
    1. Conceptually, this is like running Duplicate Playground over the internet
    2. Important to keep track of progress and resources versions using a vector clock
    3. Export / Import UI with scope (users? posts? etc.), error info (image.jpg couldn't be fetched after 3 retries), and error resolution mechanism (specify a different url? upload that image? retry 4th time?)
  • [ ] Live WordPress <-> WordPress data sync
    1. Run the WordPress <-> WordPress transfer in a continuous way.
    2. This is not about collaborative editing in the block editor, although there is likely an overlap around data synchronization.

Here's a few more use-cases we'll likely tackle along the way, but they're not key milestones on their own:

  • [ ] WXR importer
    1. Fork https://github.com/humanmade/WordPress-Importer
    2. Give attribution to the original team, ping them and start a conversation
    3. Port it to WP_XML_Tag_Processor
    4. Start using that fork for importing WXR files in Playground
    5. Rewrite the imported site URLs
    6. Use AsyncHTTP\Client for fetching assets
    7. Make it resumable if it fails halfway through
    8. Publish it as a standalone plugin to start gathering feedback and bug reports
    9. Include extension points to enable custom treatment of any block attribute, database row etc. See one of the GitHub discussions referenced in #1893
  • [ ] Markdown exporter / importer for editing existing documentation sites from GitHub 5. Discuss using it for editing Playground docs, Gutenberg docs, and potentially all WordPress docs 6. Discuss using it as a drop-in static site generator replacement (e.g. Jekyll)
    1. Adapt the exhaustive MySQL parser explored by @janjakes to parse markdown in PHP. It should only require swapping the grammar.
    2. Migrate @dmsnell's Markdown <-> Block markup TypeScript converter from https://github.com/dmsnell/blocky-formats to PHP

adamziel avatar Oct 14 '24 17:10 adamziel