alaveteli icon indicating copy to clipboard operation
alaveteli copied to clipboard

Improve/fix HTML rendering of tables

Open TomSteinberg opened this issue 10 years ago • 44 comments

e.g http://www.whatdotheyknow.com/request/it_support_services_1295#incoming-258044 http://www.whatdotheyknow.com/request/it_support_services_347#incoming-258014 http://www.whatdotheyknow.com/request/it_support_services_1236#incoming-257000

taken from #18 by @hsenag

TomSteinberg avatar May 22 '14 10:05 TomSteinberg

At the moment the conversion doesn't actually render the content in to a html table, so its a conversion issue rather than simply adding some styling.

garethrees avatar Nov 05 '14 10:11 garethrees

Another example: https://www.whatdotheyknow.com/request/mental_health_services_4#incoming-601920

The original email has a nice looking HTML table.

hsenag avatar Jan 06 '15 06:01 hsenag

I would really welcome this fix/improvement. I submit a fair few FOI requests where I end up with incomprehensible tables. For example:

https://www.whatdotheyknow.com/request/homeless_due_to_end_of_private_t_19#incoming-713565

As a result I usually write a reply asking that they email me the contents directly, which means others can't access the data if they happen across the request on the web site.

tomchance avatar Oct 12 '15 15:10 tomchance

Another example:

https://www.whatdotheyknow.com/request/number_children_home_educated_wi

What WhatDoTheyKnow shows:

Summary of Elective Home Education Cases

Academic Year

2014/15 2015/16 2016/17

SEN All From From All From From All From From status Cases mainstream Special Cases mainstream Special Cases mainstream Special Statement 10 <10 < 5 11 6 <5 9 5 <5 (1) EHCP (2) 6 <5 < 5 6 6 0 8 7 0 SEN Support 67 63 0 81 77 0 102 99 0 (3) All SEN 83 72 < 5 98 89 <5 119 111 <5 (4, 5)

Table in HTML:

screen shot 2017-07-24 at 13 56 24

In this case we offered the user an image of the table and may upload the HTML version of the response and link it from an annotation:

https://twitter.com/WhatDoTheyKnow/status/889469208383418368

RichardTaylor avatar Jul 24 '17 13:07 RichardTaylor

Adding another example where a table provided in HTML format wasn't legible https://www.whatdotheyknow.com/request/housing_register_8?unfold=1#incoming-1135657 in that case we put the tables in a Google spreadsheet and linked to it.

Also noting #4003 is a related, broader, ticket for preserving all HTML formatting in response emails.

RichardTaylor avatar Apr 05 '18 14:04 RichardTaylor

Another example: https://www.whatdotheyknow.com/request/requests_for_reception_start_at

RichardTaylor avatar Jul 04 '18 18:07 RichardTaylor

https://blog.socialcops.com/technology/engineering/camelot-python-library-pdf-data/

garethrees avatar Oct 12 '18 20:10 garethrees

https://blog.socialcops.com/technology/engineering/camelot-python-library-pdf-data/

Had a quick play with that last night (used an older laptop so made the path to install harder for myself than it needed to be) and it looks interesting. I picked a random PDF attachment with a table in it (the first one I stumbled across) and it made a reasonable job of it, reducing the entire message to just the tabular content detail here.

(But, as far as I can tell, it's only useful for tables stuck inside PDFs)

lizconlan avatar Oct 17 '18 10:10 lizconlan

https://github.com/adworse/iguvium – Ruby gem for extracting tables from PDF as a structured info

garethrees avatar Nov 24 '18 13:11 garethrees

I've come across this recently with tables of data pasted into response emails and then rendered unintelligible by the conversion.

It looks as if most of the examples above are HTML tables that get mangled, so that seems like a good place to focus. Parsing tables out of PDFs is a whole separate challenge in itself (I've been doing this a lot recently ...) and there are plenty of products/libraries out there that (attempt to) do this - Camelot, PDFTables.com, Tabula etc.

Is there any reason you couldn't give people access to the raw response? That would give people the ability to copy-paste the table direct into a spreadsheet or just to eyeball it.

mikejamesthompson avatar Jun 25 '19 08:06 mikejamesthompson

+1 see issue at:

https://www.whatdotheyknow.com/request/july_2019_discretionary_deferral_17#comment-89964

RichardTaylor avatar Oct 25 '19 12:10 RichardTaylor

+1 another example at

https://www.whatdotheyknow.com/request/tenancies_ended_by_death_of_the#incoming-1455601

RichardTaylor avatar Nov 01 '19 13:11 RichardTaylor

A WhatDoTheyKnow user writes:

A problem which I frequently encounter is that figures are supplied by the respondent as tables.

These are scrambled when they appear on WDTK (titles are shown detached from the figures), which is usually decipherable, but which occasionally necessitates a further clarification request.

Could you find a way to show the tables on WDTK as received (presumably received in the body of an email)?

RichardTaylor avatar Nov 01 '19 13:11 RichardTaylor

https://nanonets.com/blog/table-extraction-deep-learning/

garethrees avatar Jan 22 '20 12:01 garethrees

+1 from me following the poor formatting of the table at https://www.whatdotheyknow.com/request/potholes_rights_of_way_maintenan

The WDTK admin team have provided the user with a copy of the raw email.

MattK1234 avatar Apr 05 '20 12:04 MattK1234

+1 https://www.whatdotheyknow.com/request/minimum_maximum_and_median_fpas#incoming-1640619

The data has been copied to a Google sheet linked from an annotation.

RichardTaylor avatar Oct 18 '20 23:10 RichardTaylor

+1

https://www.whatdotheyknow.com/request/details_of_current_housing_stock#incoming-1743561

RichardTaylor avatar Mar 15 '21 14:03 RichardTaylor

+1 from a pro user support query.

garethrees avatar Mar 16 '21 09:03 garethrees

+1 from a WDTK user and a tired administrator for this one - the message itself has a rather poor plain-text copy which doesn't help; but HTML is 'fine' if you view it in a capable client.

I'll also add a +1 for #3547 here - it would have helped when extracting the data, since we could have formatted it in an intelligible way.

https://www.whatdotheyknow.com/request/list_of_litter_bins_in_the_borou#incoming-1806040

mdeuk avatar Jun 06 '21 17:06 mdeuk

+1 https://www.whatdotheyknow.com/request/barriers_on_hobmoor#incoming-1673541

RichardTaylor avatar Jul 21 '21 10:07 RichardTaylor

+1 https://www.whatdotheyknow.com/request/healthcare_worker_accommodation_6#incoming-1848176

RichardTaylor avatar Aug 07 '21 18:08 RichardTaylor

Sometimes we manually convert these for pro users on request. Process is:

Download raw email > open in apple mail > copy and paste into text edit (to preserve rich text formatting) > make any redactions > print as pdf > upload file > link in annotation.

garethrees avatar Nov 02 '21 09:11 garethrees

+1

https://www.whatdotheyknow.com/request/vser_payments_teaching_staff#incoming-1938493

Data supplied to user by email, and made available via Google Sheets

RichardTaylor avatar Dec 18 '21 12:12 RichardTaylor

+1 The response here is almost illegible https://www.whatdotheyknow.com/request/property_and_assets_and_building_291#incoming-1221782

FOIMonkey avatar Mar 23 '22 21:03 FOIMonkey

Wondering whether we can convert the main body HTML part to PDF and add it as an "attachment" or similar, so that:

  • We still render plain text by default
  • We don't have to sanitise and render random HTML
  • Users have better access to the underlying email presentation

Would have to consider possible extra work when it comes to hiding and censor rules though.

garethrees avatar Apr 06 '22 11:04 garethrees

Further example

https://www.whatdotheyknow.com/request/deaths_and_hospital_admissions_f_117#incoming-1943749

RichardTaylor avatar Apr 27 '22 14:04 RichardTaylor

This issue (alongside, perhaps, #4578) has actually been cited in an FOI response:

I have attached a PDF document with my full response. This includes some tables and hyperlinks that may not otherwise be supported by the WhatDoTheyKnow website.

https://www.whatdotheyknow.com/request/docs_21#incoming-1517708

WilliamWDTK avatar Jun 01 '22 20:06 WilliamWDTK

Another example on this request.

The table in the raw email actually looks like this: Table (reproduced below)

  Officers Staff
  Total Economic Crime
2013 2083.38 13.8
2014 1955.04 13.8
2015 1927.24 15.8
2016 2068.73 10.8
2017 2056.54 10.8
2018 1974.72 11.8
2019 1944.71 10.8
2020 2145.17 10.68
2021 2240.64 9.88
2022 2328.23 11.73

WilliamWDTK avatar Jun 06 '22 11:06 WilliamWDTK

Further example at https://www.whatdotheyknow.com/request/cumulative_amounts_owed_to_the_c#incoming-1776516

RichardTaylor avatar Jun 14 '22 13:06 RichardTaylor

Just to add to the above, this seems like one of the few areas WhatDoTheyKnow is less useful than the user just sending an email.

Is making the original email available as a download just to the requester an option?

ajparsons avatar Jun 28 '22 16:06 ajparsons