typeshed
typeshed copied to clipboard
[bs4] Incorrect type hints for `.__getitem__()` and `.get()`?
Tag.attrs
is type-hinted as dict[str, str]
as expected. However, Tag.__getitem__()
is type hinted as returning str | list[str]
, and Tag.get()
is type hinted as returning str | list[str] | None
. Why is that the case?
Cc @JelleZijlstra who contributes those stubs.
I think .attrs
is wrong instead; attrs can in fact be lists. Example:
In [9]: html = '<html><div class="a b">x</div></html>'
In [11]: from bs4 import BeautifulSoup
...: soup = BeautifulSoup(html, 'html.parser')
In [18]: div=soup.find_all("div")[0]
In [19]: div["class"]
Out[19]: ['a', 'b']
In [20]: div.attrs
Out[20]: {'class': ['a', 'b']}
I originally typed attrs
as Mapping[str, Any]
but #5907 changed it to Mapping[str, str]
, and later #7253 changed it to a dict instead of a Mapping.
Ah yes, forgot about class
. Are there any other cases where this can happen? I didn’t manage to get it to return a list in my tests before (you can’t simply have two of an attribute). It’s technically correct but quite annoying to cast the result to str every time.
We could use str | Any
:
- If you expect it to be a string, it will work without casting.
- If you expect it to be a list, this will likely work because
str
behaves much like a list of strings. If you really need a list, you canassert isinstance(foo, list)
orassert not isinstance(foo, str)
or similar. The| Any
really helps here as it prevents mypy from thinking that the code after an assert like that is unreachable. - If you expect it to be something else, let's say an integer, you will likely get an error.
We do a similar trick with re
methods that return a string or None
depending on the regular expression passed in, and we don't want users to have to check for None
every time.
Hello everyone, is it under working condition? if yes then please help me to understand the isuue. i want to work on it.
@AmberAnsari89 posting on every single issue isn't going to get you anywhere (and is somewhat annoying). If you don't understand an issue, your best way forward is to clarify what you understand, what you don't understand and to ask questions that prove you've at least spent time trying to figure out what's going on. Right now your comments put all the onus on maintainers to help you contribute. It isn't our job to help you do so — this isn't a classroom.
i am extremely sorry if my way is annoying.my only intention was to get an issue and start analysis it.Earlier it was happened somewhere else where without informing them i started my initial analysis. However later it was assigned to some other.
Sorry once again.
i did first analysis of below program.. typeshed/stubs/BeautifulSoup/bs4/elements.pyi
i analysed tag class and its attribute....i was looking where actual implementation has been done.
@hauntsaninja, @AlexWaygood,
i further did my analysis and have few questions to clarify.i went through the typeshed documentation as mentioned in readme link.
- i did not get how attrs of tag is related to get() and getitem().
- both methods may have different role then why there is issue if both have different type hint return type.
- and why list is suggested over dict by @JelleZijlstra.
- i tried to understand below details, mentioned by @JelleZijlstra .
- In [9]: html = 'x'
In [11]: from bs4 import BeautifulSoup ...: soup = BeautifulSoup(html, 'html.parser')
In [18]: div=soup.find_all("div")[0]
In [19]: div["class"]
Out[19]: ['a', 'b']
In [20]: div.attrs
Out[20]: {'class': ['a', 'b']}
using soup , find_all() is called with "div" string and return ResultSet[Any]. it looks list however i dont see it in stubs instead ResultSet class is there..
please furnish your kind help to go further on this issue.it looks interesting to me to get deeper to understand the typeshed in more better way and also MyPy.
Thanks
1: The relationship between Tag.attrs
and Tag.get
and Tag.__getitem__
is defined in the original source code for beautifulsoup. That's how we can determine the types of anything, e.g. take a look at https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/element.py#L1489
4: What is the type of div
and what is the type of div.attrs
in the example Jelle gave? What is the type the stub says it would be? Figuring that out will help you understand the issue :-)