aleksandar-devedzic issues

Results 7 issues of


                                            aleksandar-devedzic

How to get the list of all websites that are available for scraping?

Is there a way to get a list of websites that can be crawled property with newspaper lib? For example newspaper.sources or something like tha?

I just want to help with date extraction

These are the names of tags that can be found in SCRIPT or META tags that represent dates, maybe you will find this helpful: publishdatepublish-date prism.publicationDate coverageEndTime uploadDate date published_date...

TIPS FOR IMPROVEMENT

I have extracted some meta tags, you can try to identify title, text, description and date by replacing provided tags in : meta[property='{}'] meta[name='{}'] meta[itemprop='{}'] Meta tags for publication and...

TIPS FOR FAST IMPROVEMENT

How to add POS_TAG feature?

Your code is very clear, THAKS FOR THAT! But how to add more features to this model? You only took words, but what if I want to add POS_TAGS also?

Installing ChromeDriver and Headless Chrome Driver with latest version of Selenium

I am using Python and Selenium on AWS Lambdas for crawling. I have updated Python to 3.11 and Selenium to 4.18.0, but then my crawlers stopped working. This is the...

PDF Miner returns different results every time

I have noticed the issue with PDF miner. It returns different results each time for my PDF doc. This is my code: ``` import requests from io import BytesIO from...