petermr comments

Results 310 comments of


                                            petermr

trafficstars

PDF processing

ami-pdf will read the PDFs in bulk and split into characters and images. After that we need to know the application. Try http://discuss.contentmine.org/t/cm-ucl-ii-semantic-content-enhancement-of-table-data/396/2 for an overview of extracting tables You...

PDF processing

Much of this is available through java Tests on petermr/normami now moved to petermr/ami3 . ami3 has the tests but not the data. It's image-based, so probably limited value. Back...

PDF processing

How many documents do you have? The first step is to trun them into A CProject put them in a directory e.g. simon20190919 then ami-makeproject gives the help then ami-makeproject...

PDF processing

Here's a stack of `ami` commands ``` #! /bin/sh # your path should include the /bin directory of the appassembler distrib, e.g. # ami-forestplot => /Users/pm286/workspace/cmdev/normami/target/appassembler/bin/ami-forestplot # edit this to...

PDF processing

dont send it, add it in a new folder here unless there are copyright issues

PDF processing

from the 25K try to select ca 20 which are: * newish (old docs are problematc, but maybe that is the point) * born digital if possible * OPEN (we...

PDF processing

if it's publicly visible I'm happy. We did that with phylotrees We are allowed to extract data if we can legally read it somewhere. Doesn't have to be CC BY....

PDF processing

happy to talk on phone/skype if helps

PDF processing

if you have 100-year old records as bitmaps I am happy to try those, but they must be homogenous in type

PDF processing

see table extraction at http://discuss.contentmine.org/t/ami-eppi-cm-ucl-table-extraction-project/322/14