html-to-markdown icon indicating copy to clipboard operation
html-to-markdown copied to clipboard

🐛 Bug: spurious whitespace insertion

Open ashishb opened this issue 1 year ago • 4 comments

Describe the bug A clear and concise description of what the bug is.

HTML Input

<p><a href="https://example.com">Hello world</a>. "<a href="https://example.com">Example 1</a>".</p>

Generated Markdown

[Hello world](https://example.com). " [Example 1](https://example.com)".

Expected Markdown

[Hello world](https://example.com). "[Example 1](https://example.com)".

Additional context Spurious whitespace is inserted before links

Ref: https://github.com/ashishb/wp2hugo/issues/11

ashishb avatar Apr 07 '24 23:04 ashishb

Hi @ashishb

yeah you are totally right, that looks like a bug. It is probably related to the AddSpaceIfNessesary function 🤔

However I am currently not actively working on V1 of the library.

My focus is on the next version of the library. With V2 this will be fixed (see demo). But it is not ready yet, for example the outside API is not finished & most plugins are still missing.

Will keep this issue open for now. If you want to work on this in the meantime, I happily accept PRs!

Greetings, Johannes

JohannesKaufmann avatar Apr 08 '24 07:04 JohannesKaufmann

@JohannesKaufmann happy to send a PR to fix things, can you give me some more guidance on what might be happening here? Alternatively, should I switch to v2 instead?

ashishb avatar Apr 08 '24 20:04 ashishb

happy to send a PR to fix things

That would be amazing!

more guidance on what might be happening here?

The <a> node uses the function AddSpaceIfNessesary to look at the previous & next nodes to decide if an extra space before & after the brackets is needed.

Right now it just checks for unicode.IsSpace but maybe we could also check for unicode.IsPunct.

// AddSpaceIfNessesary adds spaces to the text based on the neighbors.
// That makes sure that there is always a space to the side, to recognize the delimiter.
func AddSpaceIfNessesary(selec *goquery.Selection, markdown string) string {

You can slightly change the logic (for example: what happens if AddSpaceIfNessesary is not used for <a> nodes) and then run go test -update to see if the golden files changed (more info).

It is important to add a few more testcases & also make sure that there aren't any regressions...

Alternatively, should I switch to v2 instead?

In the v2 I found a better way to solve this problem. So the AddSpaceIfNessesary function is not used anymore.

But it won't be released in the next few months...

JohannesKaufmann avatar Apr 09 '24 06:04 JohannesKaufmann

@JohannesKaufmann I would have gladly sent a PR but I dislike the idea of fixing V1 if V2 of your awesome library is around the corner.

ashishb avatar Jun 30 '24 08:06 ashishb

@JohannesKaufmann when is the new library being planned to be released on?

ashishb avatar Aug 18 '24 03:08 ashishb

It's in the final stretch. But I can't say exactly when it will be published

JohannesKaufmann avatar Aug 18 '24 11:08 JohannesKaufmann

On the "v2" branch are a lot of improvements — including a fix to this bug.

It is still experimental but feel free to give it a try. Happy to hear about your experience 😊

I am going to close this issue. If you find anything with the new version, please open a new issue!

JohannesKaufmann avatar Aug 31 '24 15:08 JohannesKaufmann