fundus icon indicating copy to clipboard operation
fundus copied to clipboard

[Question]: How to fix circular import bug

Open tamemo99 opened this issue 1 year ago • 3 comments

Question

I've been triying to add a new country to the list as well as a new publisher from it.

For some reason, I am getting this circular import bug whenever I try to crawl for an article:

Traceback (most recent call last): File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus\test.py", line 1, in from fundus import PublisherCollection, Crawler File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus_init_.py", line 3, in from fundus.publishers import PublisherCollection File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus\publishers_init_.py", line 1, in from fundus.publishers.at import AT File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus\publishers\at_init_.py", line 1, in from fundus.publishers.base_objects import PublisherEnum, PublisherSpec File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus\publishers\base_objects.py", line 9, in from fundus.scraping.url import NewsMap, RSSFeed, Sitemap, URLSource File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus\scraping\url.py", line 13, in from requests import ConnectionError, HTTPError File "C:\Users\Tamim\anaconda3\envs\fundacontribute\lib\site-packages\requests_init_.py", line 43, in import urllib3 File "C:\Users\Tamim\anaconda3\envs\fundacontribute\lib\site-packages\urllib3_init_.py", line 8, in import logging File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus\logging_init_.py", line 1, in from .logger import basic_logger File "C:\Users\Tamim\PycharmProjects\fundus\src\fundus\logging\logger.py", line 3, in _stream_handler = logging.StreamHandler() AttributeError: partially initialized module 'logging' has no attribute 'StreamHandler' (most likely due to a circular import)

my ae/init.py is as follows: from fundus.publishers.base_objects import PublisherEnum from fundus.scraping.url import NewsMap, Sitemap from .al_arabiya import AlArabiyaParser

class AE(PublisherEnum): Alarabiya = PublisherSpec( name="AlArabiya", domain="www.alarabiya.net", sources=[Sitemap("https://www.alarabiya.net/sitemap.xml"), NewsMap("https://www.alarabiya.net/ar-news-sitemap.xml") ], parser=AlArabiyaParser, )

and my al_arabiya.py is as follows:

import datetime from typing import List, Optional

from lxml.cssselect import CSSSelector

from fundus.parser import ArticleBody, BaseParser, ParserProxy, attribute from fundus.parser.utility import ( extract_article_body_with_selector, generic_author_parsing, generic_date_parsing, generic_topic_parsing, )

class AlArabiyaParser(ParserProxy): class V1(BaseParser): pass

I am running Python 3.9 on Conda and I also executed "pip install -e .[dev]". I created a new folder for the country ("AE"), created its _init.py in which I added the Publisher Enum class with source attributes and I also implemeted an empty parser class for said publisher.

tamemo99 avatar Apr 23 '24 20:04 tamemo99

@tamemo99 Moving test.py from C:\Users\Tamim\PycharmProjects\fundus\src\fundus\test.py -> C:\Users\Tamim\PycharmProjects\fundus\test.py should do the trick ;)

MaxDall avatar Apr 23 '24 21:04 MaxDall

Hey! Thanks, after moving the test.py the test worked and I was able to generate some output However, Im trying to run a unit test and Im getting the following:

8,670 - basic_logger - WARNING - Warning! Couldn't reach sitemap 'https://www.alarabiya.net/ar-news-sitemap.xml' because of 403 Client Error: Forbidden for url: https://www.alarabiya.net/ar-news-sitemap.xml 2024-04-24 19:08:09,132 - basic_logger - WARNING - Warning! Couldn't reach sitemap 'https://www.alarabiya.net/sitemap.xml' because of 403 Client Error: Forbidden for url: https://www.alarabiya.net/sitemap.xml 2024-04-24 19:08:09,134 - basic_logger - ERROR - Couldn't get article for AlArabiya. Skipping AlArabiya: 0%|

tamemo99 avatar Apr 24 '24 17:04 tamemo99

@tamemo99 It seems that the publisher you're trying to scrape is protected by cloud-flare. Unfortunately, there is nothing much you can do. I'm sorry for the time you already spend on this!

MaxDall avatar Apr 24 '24 18:04 MaxDall