engine icon indicating copy to clipboard operation
engine copied to clipboard

HTML to Markdown is wrong for lists of lists

Open martinratinaud opened this issue 2 years ago • 2 comments

As found in https://github.com/OpenTermsArchive/p2b-compliance-declarations/pull/113

lists are translated incorrectly and give

6.  **chapter 6**.
    
    1.  **subchapter i**. looks ok ...
        
7.  **Chapter 7**. with no subchapter seems ok too
        
8.  **Chapter 8**.
    
    1.  **Entire Agreement**. still ok

9.  **Chapter 9**.
    
    1.  **Entire Agreement**. still ok

10.  **Chapter 10**.
    
    1.  **Entire Agreement**. not ok anymore
 
11.  **Chapter 11**.
    
    1.  **Entire Agreement**. not ok anymore

instead of

6.  **chapter 6**.
    
    1.  **subchapter i**. looks ok ...
        
7.  **Chapter 7**. with no subchapter seems ok too
        
8.  **Chapter 8**.
    
    1.  **Entire Agreement**. still ok

9.  **Chapter 9**.
    
    1.  **Entire Agreement**. still ok

10. **Chapter 10**.
    
    1.  **Entire Agreement**. not ok anymore
 
11. **Chapter 11**.
    
    1.  **Entire Agreement**. not ok anymore

(Note the additional space after 10. and 11.

You can retrieve the problematic html with

wget https://www.mturk.com/participation-agreement -O participation-agreement.html

martinratinaud avatar Aug 18 '22 14:08 martinratinaud

I just tested the update of turndown and also this PR https://github.com/mixmark-io/turndown/pull/358

but it does not work, we will have either to

  • switch library (which might generate new versions on many documents if some other changes exist)
  • fix this abandoned library

Let's discuss on next planif

martinratinaud avatar Aug 18 '22 15:08 martinratinaud

I reopen as it won't be fixed until this fork is actually included into the core

martinratinaud avatar Sep 14 '22 14:09 martinratinaud

According to https://github.com/mixmark-io/turndown/pull/419#issuecomment-1361030545, it is very unlikely https://github.com/OpenTermsArchive/turndown/pull/2 will ever be merged into upstream.

The alternative suggested by the library author is to use a custom rule. This would indeed be a more perennial approach than maintaining our own fork.

MattiSG avatar Apr 24 '23 09:04 MattiSG

The issue described has been fixed in #943, the technical improvement to make this fix perennial is now described in https://github.com/OpenTermsArchive/engine/issues/1019.

MattiSG avatar Sep 03 '23 17:09 MattiSG