What are the ML models used for? I must have lost track in the slack convo...
What was the url that you were trying to scrape?
I think None makes more sense. If there had to be a date, a fallback could be the latest date mentioned in the article ( or the latest Report).
Hey I just saw this issue about extracting details from a PDF. Since we have the updated schema, I'm just wondering if we should pick this up again. I came...
Looks like scraper is already using textract which uses pdfminer which is similar to PyPDF2. But they both seem to have difficulties extracting things like titles. Content (text) wise, theres's...
I haven't found a tool that can handle this reasonably well...