Wikipedia icon indicating copy to clipboard operation
Wikipedia copied to clipboard

BeautifulSoup warning when running the Wikipedia python API.

Open Jehan opened this issue 9 years ago • 15 comments

I am running a script on many page of Wikipedia, and I regularly get the following warning:

/home/jehan/.local/lib/python3.4/site-packages/bs4/init.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml")

markup_type=markup_type))

Well at least I assume this is the Wikipedia script outputting it since I have no XML/HTML processing on my own and I see that wikipedia.py has a BeautifulSoup import:

from bs4 import BeautifulSoup

Jehan avatar Nov 27 '15 02:11 Jehan

You're not alone. I'm seeing this too.

bdalton12 avatar Nov 28 '15 05:11 bdalton12

Also having this issue, editing wikipedia.py line 389 as described in the warning does fix it:

 lis = BeautifulSoup(html, "html.parser").find_all('li')

(You can use lxml or html5lib in place of html.parser, but html.parser is built in and doesn't require any extra packages, and there doesn't seem to be any performance or functional difference for this).

Made a pull request for this change in #112

wjoe avatar Dec 09 '15 16:12 wjoe

Also having this issue, has any workaround been implemented?

Superraptor avatar Jul 05 '16 18:07 Superraptor

This repo doesn't appear to be maintained any more. I made a PR #112 which fixes it, but it hasn't been merged.

There are a bunch of forks of this project and I don't know if there is any one which is most maintained, but https://github.com/barrust/Wikipedia seems to be actively maintained and includes a fix for this issue.

wjoe avatar Jul 05 '16 19:07 wjoe

Fixed in 50bc236836dc20546af61ea7ca6198c3f039a816

marsjaninzmarsa avatar Jun 09 '17 04:06 marsjaninzmarsa

Having this problem today with: beautifulsoup4-4.6.0 certifi-2018.4.16 chardet-3.0.4 idna-2.6 requests-2.18.4 urllib3-1.22 wikipedia-1.4.0

kaleidawave avatar May 16 '18 13:05 kaleidawave

im also having this issue, and all libraries are fully up to date

ThatVoidUpdate avatar Sep 03 '18 22:09 ThatVoidUpdate

Still an issue in 2020.

LSaldyt avatar Apr 21 '20 17:04 LSaldyt

This issue has already been fixed, as shown here:

https://github.com/goldsmith/Wikipedia/blob/2065c568502b19b8634241b47fd96930d1bf948d/wikipedia/wikipedia.py#L389

However, @goldsmith still needs to push the updated package up to PyPI so people can use the updated version on there.

As a temporary fix for this, you can install the module directly from this repo using the following command:

python -m pip install --upgrade git+git://github.com/goldsmith/Wikipedia.git

This command replaces your current installation of the wikipedia module with the most recent version from here on GitHub.

bsoyka avatar Jun 17 '20 15:06 bsoyka

This issue has already been fixed, as shown here:

https://github.com/goldsmith/Wikipedia/blob/2065c568502b19b8634241b47fd96930d1bf948d/wikipedia/wikipedia.py#L389

However, @goldsmith still needs to push the updated package up to PyPI so people can use the updated version on there.

As a temporary fix for this, you can install the module directly from this repo using the following command:

python -m pip install --upgrade git+git://github.com/goldsmith/Wikipedia.git

This command replaces your current installation of the wikipedia module with the most recent version from here on GitHub.

I tried to run the command but I getting this error

Collecting git+git://github.com/goldsmith/Wikipedia.git
  Cloning git://github.com/goldsmith/Wikipedia.git to c:\users\user\appdata\local\temp\pip-req-build-_ty5p3et
  Running command git clone -q git://github.com/goldsmith/Wikipedia.git 'C:\Users\User\AppData\Local\Temp\pip-req-build-_ty5p3et'
    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\User\AppData\Local\Programs\Python\Python38\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\User\\AppData\\Local\\Temp\\pip-req-build-_ty5p3et\\setup.py'"'"'; __file__='"'"'C:\\Users\\User\\AppData\\Local\\Temp\\pip-req-build-_ty5p3et\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\User\AppData\Local\Temp\pip-pip-egg-info-dg8l8o6n'
         cwd: C:\Users\User\AppData\Local\Temp\pip-req-build-_ty5p3et\
    Complete output (5 lines):
    Traceback (most recent call last):
      File "", line 1, in 
      File "C:\Users\User\AppData\Local\Temp\pip-req-build-_ty5p3et\setup.py", line 19, in 
        version = re.search(
    AttributeError: 'NoneType' object has no attribute 'groups'
    ----------------------------------------
WARNING: Discarding git+git://github.com/goldsmith/Wikipedia.git. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

bhagwanZaki avatar Apr 28 '21 17:04 bhagwanZaki

Still an issue in 2021. When will this fix be merged?

FurkanToprak avatar May 13 '21 13:05 FurkanToprak

still having this isue with a project i'm working on, tried the command and still getting an error, is there something i can edit to fix this?

thatITfox avatar May 16 '21 07:05 thatITfox

I also had this error but then I used wikipedia api for doing my task. code is like this.

wikipedia = wikipediaapi.Wikipedia('en')
        page_py = wikipedia.page(search)
        
        if page_py.exists():
            return f"{page_py.title}\n{page_py.summary[:words]}"
        return "Couldn't find about the given data."

AnantLuthra avatar Jul 04 '22 15:07 AnantLuthra

The issue is still not fixed.

Madis-code avatar Jul 20 '22 05:07 Madis-code

Encountering still.

JiveyGuy avatar May 18 '24 03:05 JiveyGuy