ascii-tables icon indicating copy to clipboard operation
ascii-tables copied to clipboard

Cannot "parse" wikimedia

Open tmcfarlane opened this issue 8 years ago • 2 comments

Summary

If you load up the default input, by refreshing the page, and switch your output style to wikimedia and hit "parse" on the output, you'll get an error prompt. When you hit "OK" your input goes blank but your output remains. Switching the output style afterwards will also not affect the input nor the output.

Investigation

Current default input (with correct tabs):

Col1    Col2    Col3    Numeric Column
Value 1 Value 2 123 10.0
Separate    cols    with a tab or 4 spaces  -2,027.1
This is a row with only one cell

Current wikitable output of above input:

{| class="wikitable"

! Col1                             
! Col2    
! Col3                   
! Numeric Column 
|-

| Value 1                          
| Value 2 
| 123                    
| 10.0           
|-

| Separate                         
| cols    
| with a tab or 4 spaces 
| -2,027.1       
|-

| This is a row with only one cell 
|         
|                        
|                
|}

Prompt ascii-tables-error-prompt

tmcfarlane avatar Aug 31 '16 20:08 tmcfarlane

It looks like there are only specific circumstances where tables can be parsed at the moment. The first line must be present that has a distinct character to indicate where the columns are, all the remaining column separators must be lined up with the header line separators.

A more generalized solution might be to compare each line to see where there are (non-alphanumeric?) characters that are the same all the way from the bottom to the top of the table to be parsed, or at least are tied for the most in a single column.

Dealing with HTML and wikimedia syntax would be a bit more, there is a javascript implementation of an html to csv parser here: https://gist.github.com/adilapapaya/9787842 I didn't see a wikimedia table parser written in javascript, but I suspect I'm just not using the right search terms.

dwesely avatar Sep 01 '16 01:09 dwesely

Regarding the original issue, I don't think parsing wikimedia is actually much of a priority here. There are so many different table formats wikimedia supports and I don't really understand why. It's almost like they had an old way, then changed it, and never removed the first one. But since you are attempting to parse in javascript it can definitely get tricky as you said. Since the table definitions aren't consistent, I'd just avoid that idea all together.

I'd honestly disable the parse button when wikimedia is selected for now and just throw a message below it letting people know its not supported. This will keep the appearance of the site looking good, though I don't know how many people besides myself who have/would try this.

tmcfarlane avatar Sep 17 '16 04:09 tmcfarlane