Science Magazine
Science also publishes fabulous cutting-edge scientific research data of different types bundled all-together in a PDF called "supplementary materials"
e.g. http://www.sciencemag.org/content/suppl/2013/10/30/342.6158.592.DC1/1243283.McLellan.SM.pdf
In this PDF (!) they have bundled/bungled together:
- words
- image data
- tables
One of the tables (S1) is split over THREE pages (with page breaks in between) and if you try and copy and paste out the whole table in one go, it'll be contaminated by the page numbers at each page break AND the footnotes on each section of the (same) table.
It is fairly typical of the many supp. materials files they publish each and every week.
SCREENSHOT of page break between table

@rossmounce perfect - let's get this up. Do you want to have a go at the fork and add route described in http://okfnlabs.org/bad-data/add/? Also do you have an example where you actually went to the trouble of extracting the data?