aleksandar-devedzic
aleksandar-devedzic
Is there a way to get a list of websites that can be crawled property with newspaper lib? For example newspaper.sources or something like tha?
These are the names of tags that can be found in SCRIPT or META tags that represent dates, maybe you will find this helpful: publishdatepublish-date prism.publicationDate coverageEndTime uploadDate date published_date...
I have extracted some meta tags, you can try to identify title, text, description and date by replacing provided tags in : meta[property='{}'] meta[name='{}'] meta[itemprop='{}'] Meta tags for publication and...
I have extracted some meta tags, you can try to identify title, text, description and date by replacing provided tags in : meta[property='{}'] meta[name='{}'] meta[itemprop='{}'] Meta tags for publication and...
Your code is very clear, THAKS FOR THAT! But how to add more features to this model? You only took words, but what if I want to add POS_TAGS also?
I am using Python and Selenium on AWS Lambdas for crawling. I have updated Python to 3.11 and Selenium to 4.18.0, but then my crawlers stopped working. This is the...
I have noticed the issue with PDF miner. It returns different results each time for my PDF doc. This is my code: ``` import requests from io import BytesIO from...