AttributeError: 'HTMLParser' object has no attribute 'unescape'
🚨Please review the Troubleshooting section before reporting any issue. Don't forget to check also the current issues to avoid duplicates.
AttributeError
Receiving the following error:
AttributeError: 'HTMLParser' object has no attribute 'unescape'
Your environment
- Operating System (name/version): Windows V.2004
- Python version: 3.9
- coursera-dl version: 0.11.5
Steps to reproduce
Method:
coursera-dl regression-models
- Is the problem happening with the latest version of the script?
- Do you have all the recommended versions of the modules? See them in the
file
requirements.txt. - What is the course that you are trying to access?
- What is the precise command line that you are using (don't forget to obfuscate your username and password, but leave all other information untouched).
- What are the precise messages that you get? Please, use the
--debugoption before posting the messages as a bug report. Please, copy and paste them. Don't reword/paraphrase the messages.
Expected behaviour
Tell us what should happen.
Actual behaviour
C:\Users\ryan1\Documents>coursera-dl regression-models
coursera_dl version 0.11.5
Downloading class: regression-models (1 / 1)
Parsing syllabus of on-demand course . This may take some time, please be patient ...
Processing module week-1-least-squares-and-linear-regression
Processing section introduction
Processing lecture welcome-to-regression-models (supplement)
Traceback (most recent call last):
File "c:\users\ryan1\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\ryan1\appdata\local\programs\python\python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\ryan1\AppData\Local\Programs\Python\Python39\Scripts\coursera-dl.exe_main.py", line 7, in
Hi! I was receiving the exact same error, did some research and I discovered that in python 3.9.0 HTMLParser.unescape was removed, so I switched back to python 3.8 and it is working perfectly fine. Switching to a older version that is greater than 3.4 should work. Hope it helps!
Hi! I was receiving the exact same error, did some research and I discovered that in python 3.9.0 HTMLParser.unescape was removed, so I switched back to python 3.8 and it is working perfectly fine. Switching to a older version that is greater than 3.4 should work. Hope it helps!
Thanks, that worked!
Here is a patch without needing to downgrade to python3.8 https://github.com/coursera-dl/edx-dl/pull/651/commits/5490a99a98b56f544661c131229ef640ace2b064 works in linux.
Here is a patch without needing to downgrade to python3.8 coursera-dl/edx-dl@5490a99 works in linux.
I tried copying the file into Coursera folder but did help got ImportError: cannot import name 'random_string' from 'coursera.utils' error
@rm00-git how to go to previous version of python . Please tell me. Much needed.
@rm00-git Thank you so much man. I really appretiate it.
first thanks @zenny
-
go to C:\Users\{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
-
=> open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
Hi! I was receiving the exact same error, did some research and I discovered that in python 3.9.0 HTMLParser.unescape was removed, so I switched back to python 3.8 and it is working perfectly fine. Switching to a older version that is greater than 3.4 should work. Hope it helps!
Thanks, it worked!
Here is a patch without needing to downgrade to python3.8 coursera-dl/edx-dl@5490a99 works in linux.
refer to ziko442's reply to fix the issue with coursera-dl, his reply is for edx-dl.
If you modify coursera\utils.py to import html and then replace h = html_parser.HTMLParser() with h = html
3.9 Works. Suggest the dev team to implement this change.
If you modify coursera\utils.py to import html and then replace h = html_parser.HTMLParser() with h = html
3.9 Works. Suggest the dev team to implement this change.
It worked for me on debian buster with bullseye testing repositories.
If you modify coursera\utils.py to import html and then replace h = html_parser.HTMLParser() with h = html
3.9 Works. Suggest the dev team to implement this change.
Hi i did it and still getting" AttributeError: 'HTMLParser' object has no attribute 'unescape'" any idea what i can do more to solve it? Thanks!
@adirb1 Try also commenting out this line:
from six.moves import html_parser
And you need to replace two occurrences of h = html_parser.HTMLParser() with h = html.
https://github.com/coursera-dl/coursera-dl/pull/789#issuecomment-800142031
first thanks @zenny
- go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
- => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
I think there may be a problem with that change. Here, html.unescape() is only available from Python 3.4. Bellow that, it doesn't exist. So I think your code may not work on Python from 3.0 through 3.3.
You might want to change the line:
if sys.version_info[0] >= 3:
to
if sys.version_info[0] >= 3 and sys.version_info[1] >= 4:
I'm not using Python 3.3 (using 3.9), but only found your reply after having read about Python 3.4 as minimum, so thought you might want to correct the file. Thanks though. I'll go for that instead of hard-coding the new way as I was doing.
The software works for me now using the CAUTH flag and ziko's instructions. Thank you so much.
OS Name: Microsoft Windows 10 Enterprise Version: 10.0.19043 Build 19043 System Type: x64-based PC
first thanks @zenny
- go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
- => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
Pull request #789 should fix this issue...
first thanks @zenny
- go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
- => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
Worked in OSX Catalina as well.
It works for me:
apt install python3.9-dev
MAKE SURE YOU INSTALL IT FOR RIGHT PYTHON VERSION! (python3.x-dev)
If you use python 3.x install python3.x-dev, and so on
@rm00-git how to go to previous version of python . Please tell me. Much needed.
sudo update-alternatives --config python3 you should get a table with the different versions of python. select the option that has your older version of python
first thanks @zenny
1. go to C:\Users\{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera: 2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
Thanks
This issue was fixed by pull request #789 (as mentioned above),
As such, there is no reason to risk security hazards by resorting to replacing large amounts of code...
first thanks @zenny
- go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
- => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
If anybody using Coursera package, just do this if you can't fix the utils.py file and it will be fix
first thanks @zenny
- go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
- => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
This is great. This works for Python 3.10.2 as well.
The software works for me now using the CAUTH flag and ziko's instructions. Thank you so much.
OS Name: Microsoft Windows 10 Enterprise Version: 10.0.19043 Build 19043 System Type: x64-based PC
first thanks @zenny
- go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
- => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
This worked on my Mac OS X 11.6.3 with Python 3.9 as of today date except the path of utils.py is:
/Users/<your_username>/opt/anaconda3/lib/python3.9/site-packages/coursera/utils.py
I modified the file as per above link and I could complete downloading the few courses which were throwing the html parser error. I also had to get the CAUTH using Safari development tools/web inspector. Not very cool to have to crawl the internet to get it working but well worth the time saved to download all courses. The only thing which are not downloaded at all are all the readings contained in an iframe.
I replaced util.py with https://github.com/coursera-dl/coursera-dl/issues/778 but still not working. There is an error saying "XXXX/python3.9/site-packages/coursera/utils.py", line 118, in clean_filename s = h.unescape(s) AttributeError: 'HTMLParser' object has no attribute 'unescape'"
Not sure what went wrong. I already set h = html
Fixed. Thanks a lot!
If you modify coursera\utils.py to import html and then replace h = html_parser.HTMLParser() with h = html
3.9 Works. Suggest the dev team to implement this change.
I edited utils.py
import html
comment out from six.moves import html_parser
replace h = html_parser.HTMLParser() with h = html (2 position)
Then download with command coursera-dl -ca <some_cookies_value_get_from_browser> <course_name>
first thanks @zenny
- go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
- => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
Thanks this has worked for me
first thanks @zenny
- go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
- => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d
this help me.. thanks ... Am using ubuntu 23