FSharp.Formatting icon indicating copy to clipboard operation
FSharp.Formatting copied to clipboard

Unicode symbols in Markdown

Open dsevastianov opened this issue 11 years ago • 10 comments

Unicode html tags like α α don't seem to work in .fsx or Markdown comments in code. Work-around with latex syntax is pretty simple though.

dsevastianov avatar Jan 05 '14 23:01 dsevastianov

I cannot quite replicate this problem.

I've tried this:

(**
> You can use quotes α α
*)

as well as:

# Home Page

α α This file will load as the root of your docs

and both display fine

image

image

Can someone else reproduce the problem?

DavidSSL avatar Apr 07 '23 13:04 DavidSSL

Maybe the fix was https://github.com/fsprojects/FSharp.Formatting/pull/464?

does that look like it’d fix it @DavidSSL?

nhirschey avatar Apr 07 '23 13:04 nhirschey

@nhirschey , it is difficult to tell 100% if that fixed it because it's not clear what the original problem in this ticket was.

However, the fix in #464 would, I believe, have had an impact on the bug in this ticket and the fact that I can't reproduce it in as in the examples I've provided would appear to be the case.

My guess is that we could close this ticket and re-open it unless you can reproduce the original bug.

DavidSSL avatar Apr 07 '23 14:04 DavidSSL

Thanks for checking @DavidSSL. Before closing we should add a few more tests to make sure.

For future reference to me and others, relevant commonmark test cases are around the below case in https://spec.commonmark.org/0.30/spec.json

  {
    "markdown": "  & © Æ Ď\n¾ ℋ ⅆ\n∲ ≧̸\n",
    "html": "<p>  &amp; © Æ Ď\n¾ ℋ ⅆ\n∲ ≧̸</p>\n",
    "example": 25,
    "start_line": 650,
    "end_line": 658,
    "section": "Entity and numeric character references"
  }

nhirschey avatar Apr 07 '23 16:04 nhirschey

You're welcome @nhirschey. I was inspired by your Amplifying FSharp talk to contribute back even though I don't use FSharp.Formating :).

Out of curiosity, since I am not so familiar with this domain, for the common mark tests above, the result you'd be expecting is this?

image

DavidSSL avatar Apr 07 '23 16:04 DavidSSL

@DavidSSL that's very kind of you to say! It'd be great to have your help. I believe your example is rendering correctly.

One thing you could try .... there are tests for the commonmark spec in this library but the library does not pass all of them so some of them are disabled. It looks like the "Entity and numeric character references" tests are disabled currently. If you want, perhaps try enabling them by adding them here? https://github.com/fsprojects/FSharp.Formatting/blob/3fdd5b9a186e35798d87ceee4bee692374304bed/tests/FSharp.Markdown.Tests/CommonMarkSpecTest.fs#L24

If dotnet test runs without any failures after you add "Entity and numeric character references" to the enabled sections, then I'd say make a pull request with your change and we'd be good to close this issue.

nhirschey avatar Apr 12 '23 16:04 nhirschey

@nhirschey, you will have to give me some time to look at this because I'm not so familiar with the domain and the code. Thanks for pointing me in the right direction though. I should be able to figure things out.

Having said that, the following:

https://github.com/fsprojects/FSharp.Formatting/blob/09491d141cb3d1bacb1c4b307a924014c90e8428/tests/commonmark_spec.json#L2307-L2312

does not look correct at all. I would assume that the HTML should be like what I have in the post above. Correct?

Moreover, it would appear that #464 is actually incorrectly implemented.

https://github.com/fsprojects/FSharp.Formatting/blob/09491d141cb3d1bacb1c4b307a924014c90e8428/tests/FSharp.Markdown.Tests/Markdown.fs#L21-L25

because when I run this via dingus, I get:

image

As you can see, I do need further guidance in terms of expected behaviour.

DavidSSL avatar Apr 14 '23 14:04 DavidSSL

@nhirschey I think that I understand the problem space better. However, that is a bigger piece of work than envisaged. Basically, if you use the https://spec.commonmark.org/0.30/spec.json, tests belonging to the Fenced code blocks and Tabs sections also start breaking.

I will certainly give it a go but it could be quite a slog.

DavidSSL avatar Apr 18 '23 18:04 DavidSSL

Thanks for digging into this @DavidSSL.

For sure it's too much to get this library fully complying with the commonmark spec in one go; that's a huge task. But it would be awesome if you happen to find a bite-size chunk that you can fix to help push us towards that goal.

No pressure, no rush. Even simply through your investigation here I've learned some things, thank you.

Regarding,

 [<Test>] 
 let ``Don't double encode HTML entities outside of code`` () = 
     "a &gt; & &copy; b" 
     |> Markdown.ToHtml 
     |> should contain "<p>a &gt; &amp; &copy; b</p>" 

I agree that it should contain "<p>a &gt; &amp; © b</p>".

Some existing tests account for "improper" markdown parsing, and they will break when the parsing is better. When I wrote the below test for emphasis based off the commonmark spec, I knew the actual correct value should be <p>a*&quot;foo&quot;*</p> but I was focused on fixing emphasis, not quotes.

https://github.com/fsprojects/FSharp.Formatting/blob/09491d141cb3d1bacb1c4b307a924014c90e8428/tests/FSharp.Markdown.Tests/Markdown.fs#L892-L895

nhirschey avatar Apr 19 '23 11:04 nhirschey

@nhirschey things are clear now. I'll create tickets and link back to this issue to try and move the needle towards compliance. I might not succeed but I'll sure try and give it a go.

DavidSSL avatar Apr 25 '23 18:04 DavidSSL