`replace_with` fails with `Cannot replace one element with another when the element to be replaced is not part of a tree` error
Sorry about the title, I am not sure how to name this issue as it is very specific and related to a specific post. I am still investigating where it comes from but thought I could post it as some of you might have a better idea of the source of the issue.
The importer fails on this specific item
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is a WordPress eXtended RSS file generated by WordPress as an export of your site. -->
<!-- It contains information about your site's posts, pages, comments, categories, and other content. -->
<!-- You may use this file to transfer that content from one site to another. -->
<!-- This file is not intended to serve as a complete backup of your site. -->
<!-- To import this information into a WordPress site follow these steps: -->
<!-- 1. Log in to that site as an administrator. -->
<!-- 2. Go to Tools: Import in the WordPress admin panel. -->
<!-- 3. Install the "WordPress" importer from the list. -->
<!-- 4. Activate & Run Importer. -->
<!-- 5. Upload this file using the form provided on that page. -->
<!-- 6. You will first be asked to map the authors in this export file to users -->
<!-- on the site. For each author, you may choose to map to an -->
<!-- existing user on the site or to create a new user. -->
<!-- 7. WordPress will then import each of the posts, pages, comments, categories, etc. -->
<!-- contained in this file into your site. -->
<!-- generator="WordPress/4.9.3" created="2021-12-24 12:02" -->
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:wp="http://wordpress.org/export/1.2/" version="2.0">
<description>Débit de beaux sons</description>
<pubDate>Fri, 24 Dec 2021 12:02:30 +0000</pubDate>
<title>Lucy Dacus — No Burden (debut LP)</title>
<pubDate>Fri, 01 Apr 2016 13:45:31 +0000</pubDate>
<guid isPermaLink="false">http://limonadier.net?p=38158</guid>
<description />
<iframe style="border: 0; width: 100%; height: 42px;" src="https://bandcamp.com/EmbeddedPlayer/album=2384669499/size=small/bgcol=ffffff/linkcol=0687f5/artwork=none/track=278575587/transparent=true/" width="300" height="150" seamless=""><a href="http://lucydacus.bandcamp.com/album/no-burden">No Burden by Lucy Dacus</a></iframe>
<iframe style="border: 0; width: 100%; height: 42px;" src="https://bandcamp.com/EmbeddedPlayer/album=2384669499/size=small/bgcol=ffffff/linkcol=0687f5/artwork=none/track=2967876146/transparent=true/" width="300" height="150" seamless=""><a href="http://lucydacus.bandcamp.com/album/no-burden">No Burden by Lucy Dacus</a></iframe>
<excerpt:encoded />
<wp:post_date><![CDATA[2016-04-01 15:45:31]]></wp:post_date>
<wp:post_date_gmt><![CDATA[2016-04-01 13:45:31]]></wp:post_date_gmt>
<wp:post_password />
Here is the traceback
Traceback (most recent call last):
File "manage.py", line 10, in <module>
File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
File "/usr/local/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
output = self.handle(*args, **options)
File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/management/commands/import_xml.py", line 70, in handle
File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 113, in run
File "/usr/local/lib/python3.8/functools.py", line 967, in __get__
val = self.func(instance)
File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 518, in cleaned_data
"body": self.body_stream_field(self.prefilter_content(self.raw_body)),
File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/importers/wordpress.py", line 436, in body_stream_field
File "/usr/local/src/wagtail-wordpress-import/wagtail_wordpress_import/block_builder.py", line 58, in promote_child_tags
File "/usr/local/lib/python3.8/site-packages/bs4/element.py", line 266, in replace_with
raise ValueError(
ValueError: Cannot replace one element with another when the element to be replaced is not part of a tree.
And here are some logs I added in the promote_child_tags method
Promotee <iframe height="150" src="https://bandcamp.com/EmbeddedPlayer/album=2384669499/size=small/bgcol=ffffff/linkcol=0687f5/artwork=none/track=2967876146/transparent=true/" width="300"><a href="http://lucydacus.bandcamp.com/album/no-burden">No Burden by Lucy Dacus</a></iframe>
Parent <p> <br/>
<iframe height="150" src="https://bandcamp.com/EmbeddedPlayer/album=2384669499/size=small/bgcol=ffffff/linkcol=0687f5/artwork=none/track=2967876146/transparent=true/" width="300"><a href="http://lucydacus.bandcamp.com/album/no-burden">No Burden by Lucy Dacus</a></iframe></p>
Parent name p
Removee tags ['p', 'div', 'span']
Wagtail v2.15.2
I installed wagtail-wordpress-import
from the main branch yesterday, so I am using the latest version of this codebase.
Something odd I noticed is the fact that <p> <br />
is the "parent" whereas these tags are not even in the original xml :thinking:
Thanks for the report.
I tried importing your XML snippet and it works OK, without console errors or warnings. I get a single imported page as expected with 2 'raw_html` blocks, each containing the iframe.
Something odd I noticed is the fact that
<p> <br />
is the "parent" whereas these tags are not even in the original xml 🤔
The <p> <br />
tags are added in the bleach process.