[Question]: How to fix circular import bug
Question
I've been triying to add a new country to the list as well as a new publisher from it.
For some reason, I am getting this circular import bug whenever I try to crawl for an article:
Traceback (most recent call last):
File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus\test.py", line 1, in
my ae/init.py is as follows: from fundus.publishers.base_objects import PublisherEnum from fundus.scraping.url import NewsMap, Sitemap from .al_arabiya import AlArabiyaParser
class AE(PublisherEnum): Alarabiya = PublisherSpec( name="AlArabiya", domain="www.alarabiya.net", sources=[Sitemap("https://www.alarabiya.net/sitemap.xml"), NewsMap("https://www.alarabiya.net/ar-news-sitemap.xml") ], parser=AlArabiyaParser, )
and my al_arabiya.py is as follows:
import datetime from typing import List, Optional
from lxml.cssselect import CSSSelector
from fundus.parser import ArticleBody, BaseParser, ParserProxy, attribute from fundus.parser.utility import ( extract_article_body_with_selector, generic_author_parsing, generic_date_parsing, generic_topic_parsing, )
class AlArabiyaParser(ParserProxy): class V1(BaseParser): pass
I am running Python 3.9 on Conda and I also executed "pip install -e .[dev]". I created a new folder for the country ("AE"), created its _init.py in which I added the Publisher Enum class with source attributes and I also implemeted an empty parser class for said publisher.
@tamemo99 Moving test.py from C:\Users\Tamim\PycharmProjects\fundus\src\fundus\test.py -> C:\Users\Tamim\PycharmProjects\fundus\test.py should do the trick ;)
Hey! Thanks, after moving the test.py the test worked and I was able to generate some output However, Im trying to run a unit test and Im getting the following:
8,670 - basic_logger - WARNING - Warning! Couldn't reach sitemap 'https://www.alarabiya.net/ar-news-sitemap.xml' because of 403 Client Error: Forbidden for url: https://www.alarabiya.net/ar-news-sitemap.xml 2024-04-24 19:08:09,132 - basic_logger - WARNING - Warning! Couldn't reach sitemap 'https://www.alarabiya.net/sitemap.xml' because of 403 Client Error: Forbidden for url: https://www.alarabiya.net/sitemap.xml 2024-04-24 19:08:09,134 - basic_logger - ERROR - Couldn't get article for AlArabiya. Skipping AlArabiya: 0%|
@tamemo99 It seems that the publisher you're trying to scrape is protected by cloud-flare. Unfortunately, there is nothing much you can do. I'm sorry for the time you already spend on this!