2017-frontend icon indicating copy to clipboard operation
2017-frontend copied to clipboard

Qri console not sure if the object is NoneType or SoupNode

Open TheWorldEndsWithUs opened this issue 6 years ago • 3 comments

Platform How were you accessing the qri frontend? [ ] app.qri.io [ ] electron app [x] dev webapp

Version What version of the Qri frontend are you using? 0.8.2 Describe the bug A clear and concise description of what the bug is. I'm not too sure if it's a bug, or I'm just confused. Based on the output in the console the same object can be determined as a NoneType and a SoupNode. To Reproduce Steps to reproduce the behavior: I posted a gif below to better help shed some light on the issue, but if I change the method after a soup node from body() to contents() the compiler will say one is a NoneType and the other one is a Soup Node. Expected behavior Both of the items to be the same

Screenshots If applicable, add screenshots to help explain your problem. Jul 4 2019 4_03 PM - Edited

Additional context Add any other context about the problem here.

TheWorldEndsWithUs avatar Jul 04 '19 20:07 TheWorldEndsWithUs

Hard to say what exactly is going on without knowing the url being scraped and the contents it is returning. Given the class names "_e296pg", "_qgfkoz", "_dwmetq" it's possible that the names are newly generated for each response, which means sometimes "_dwmetq" will exist in the page and sometimes it won't. In the former case it will have the type SoupNode whereas in the later case it will be None.

https://godoc.org/github.com/qri-io/starlib/bsoup The correct method is "contents", not "body", and it returns the list of children.

dustmop avatar Jul 08 '19 14:07 dustmop

Just saw the additional information had been posted in our Discord, adding it here so that the issue has the full context. The script being used is at https://pastebin.com/VrqQ7Pwt, and the url in question is https://www.khanacademy.org/science. From some testing, the server is returning different responses, but they all seem to be using the "_dwmetq" class in the html. It's possible that, due to seeing many requests during your script development, the server may have began sending back rate-limited responses that had different bodies.

We need better auditing tools in our http library, such as some way to capture and reuse http responses, or least some way to tell when a server response has changed significantly.

dustmop avatar Jul 08 '19 14:07 dustmop

So I figured out it was my error. The page was inconsistent all the way through like you stated above.

TheWorldEndsWithUs avatar Jul 08 '19 16:07 TheWorldEndsWithUs