sanitize-html icon indicating copy to clipboard operation
sanitize-html copied to clipboard

Expected a closing tag for <p> (2:1-2:4) before the end of paragraph

Open FormalSnake opened this issue 1 year ago • 4 comments

PLEASE NOTE: make sure the bug exists in the latest patch level of the project. For instance, if you are running a 2.x version of Apostrophe, you should use the latest in that major version to confirm the bug.

To Reproduce

Use this code for mdx files:

export const correctHtml = (html: string) => {
  return sanitizeHtml(html, {
    allowVulnerableTags: true,
    allowedTags: false, // allow all tags but close unclosed tags
    allowedAttributes: false, // allow all attributes
    allowedSchemes: ['data', 'http', 'https'],
  })
}

Get some code out of it:

<p>This is intended as a quick reference and showcase. For more complete info, see [John Gruber's original spec](http://daringfireball.net/projects/markdown/) and the [Github-flavored Markdown info page](http://github.github.com/github-flavored-markdown/).

</p><p>Note that there is also a <a href="./Markdown-Here-Cheatsheet">Cheatsheet specific to Markdown Here</a> if that's what you're looking for. You can also check out <a href="./Other-Markdown-Tools">more Markdown tools</a>.</p>
<h5>Table of Contents</h5>
<p><a href="#headers">Headers</a><br /><a href="#emphasis">Emphasis</a><br /><a href="#lists">Lists</a><br /><a href="#links">Links</a><br /><a href="#images">Images</a><br /><a href="#code">Code and Syntax Highlighting</a><br /><a href="#footnotes">Footnotes</a><br /><a href="#tables">Tables</a><br /><a href="#blockquotes">Blockquotes</a><br /><a href="#html">Inline HTML</a><br /><a href="#hr">Horizontal Rule</a><br /><a href="#lines">Line Breaks</a><br /><a href="#videos">YouTube Videos</a>  </p>
<p>&lt;a name="headers"/&gt;</p>
<h2>Headers</h2>

Use it for blogs in Astro and error: Expected a closing tag for <p> (2:1-2:4) before the end of paragraph This is due to this:

<p>This is intended as a quick reference and showcase. For more complete info, see [John Gruber's original spec](http://daringfireball.net/projects/markdown/) and the [Github-flavored Markdown info page](http://github.github.com/github-flavored-markdown/).
**This space is fatal for some reason**
</p>

FormalSnake avatar Aug 28 '24 12:08 FormalSnake

Thanks for opening this, but can you provide a more simplified example to reproduce the error? I'm not seeing this same error when I create a (simplified) test to be run by mocha.

BoDonkey avatar Aug 28 '24 15:08 BoDonkey

I can't really create a minimal example but I will explain what happens.

I use a custom (private) CMS system that automatically creates HTML and markdown code. This then gets downloaded and put into an MDX file.

export function getBlogs() {
  const blogs = data.blogs
  fs.emptyDirSync(`./src/content/blog/`)

  blogs.forEach(async (blog) => {
    const slug = makeSlug(blog.title)
    const date = parseFormattedDate(blog.publish_date)
    const parsedContent = await marked.parse(blog.content)

    console.log(`Writing ${slug}.mdx`)
    fs.writeFile(
      `./src/content/blog/${slug}.mdx`,
      `---
title: '${blog.title}'
description: '${blog.description}'
pubDate: '${date}'
heroImage: 'data:image/${blog.hero_image?.content_type};base64,${blog.hero_image?.base64}'
---

${removeSpecificTags(correctHtml(parsedContent), 'br')}
`,
    )
  })
}
// function that corrects HTML tags
export const correctHtml = (html: string) => {
  return sanitizeHtml(html, {
    allowVulnerableTags: true,
    allowedTags: false, // allow all tags but close unclosed tags
    allowedAttributes: false, // allow all attributes
    allowedSchemes: ['data', 'http', 'https'],
  })
}
export const removeSpecificTags = (html: string, tag: string) => {
  const regex = new RegExp(`<${tag}\\s*\\/?>`, 'gi')
  return html.replace(regex, '')
}

Which puts the mdx in a folder for Astro to see as a content thingy:

import { defineCollection, z } from 'astro:content'

const blog = defineCollection({
  type: 'content',
  // Type-check frontmatter using a schema
  schema: z.object({
    title: z.string(),
    description: z.string(),
    // Transform string to Date object
    pubDate: z.coerce.date(),
    updatedDate: z.coerce.date().optional(),
    heroImage: z.string().optional(),
  }),
})

export const collections = { blog }

This is a blogpost page:

---
import { type CollectionEntry, getCollection } from 'astro:content'
import BlogPost from '../../layouts/BlogPost.astro'

export async function getStaticPaths() {
  const posts = await getCollection('blog')
  return posts.map((post) => ({
    params: { slug: post.slug },
    props: post,
  }))
}
type Props = CollectionEntry<'blog'>

const post = Astro.props
const { Content } = await post.render()
---

<BlogPost {...post.data}>
  <Content />
</BlogPost>

---
import type { CollectionEntry } from 'astro:content'
import BaseHead from '../components/BaseHead.astro'
import Header from '../components/Header.astro'
import Footer from '../components/Footer.astro'
import FormattedDate from '../components/FormattedDate.astro'
import ProgressiveBlur from '@/components/ProgressiveBlur.astro'
import { Badge } from '@/components/ui/badge'
import BackgroundEffect from '@/components/background-effect.astro'
import Layout from '@/components/Layout.astro'

type Props = CollectionEntry<'blog'>['data']

const { title, description, pubDate, updatedDate, heroImage } = Astro.props
---

<Layout SITE_TITLE={title} SITE_DESCRIPTION={description}>
  <section class="items-center gap-2 row-span-4 col-span-4 image-container">
    <Badge variant="secondary"
      ><FormattedDate date={pubDate} />
      {
        updatedDate && (
          <div class="last-updated-on">
            Last updated on <FormattedDate date={updatedDate} />
          </div>
        )
      }</Badge
    >
    <h1
      class="text-3xl font-bold tracking-tighter sm:text-4xl md:text-5xl lg:text-6xl/none"
      transition:name={`title-${title.replace(/[^a-z0-9]/gi, '')}`}
    >
      {title}
    </h1>
    <!-- <span
          class="max-w-[750px] text-center text-lg font-light text-foreground"
          data-br=":rme:"
          data-brr="1"
          style="display: inline-block; vertical-align: top; text-decoration: inherit; max-width: 502px;"
          transition:name={`description-${title.replace(/[^a-z0-9]/gi, '')}`}
          >{description}</span
        > -->
    <img
      width={1020}
      height={510}
      src={heroImage}
      alt=`Hero image for ${title}`
      class="aspect-video overflow-hidden rounded-xl w-full h-fit object-cover object-center mt-8"
      transition:name={`hero-${title.replace(/[^a-z0-9]/gi, '')}`}
    />

    <div
      class="w-full mt-8 prose md:prose-lg lg:prose-xl dark:prose-invert max-w-none prose-img:rounded-xl"
    >
      <slot />
    </div>
  </section>
</Layout>

This is how I get the error.

FormalSnake avatar Aug 28 '24 16:08 FormalSnake

Hi @FormalSnake, we can't really speak to MDX, just to this module. So what I recommend is that you send a PR that adds a new test to test.js that fails.

boutell avatar Aug 28 '24 19:08 boutell

I will try, thanks! What I have noticed is that this happens when I copy-paste content or use codeblocks in the CMS, if I write it from scratch in the CMS itself it doesn't seem to happen so it's probably an issue on my part but edge cases like this should be fixed here i believe.

FormalSnake avatar Aug 28 '24 20:08 FormalSnake