coursera-dl icon indicating copy to clipboard operation
coursera-dl copied to clipboard

AttributeError: 'HTMLParser' object has no attribute 'unescape'

Open rm00-git opened this issue 5 years ago • 29 comments

🚨Please review the Troubleshooting section before reporting any issue. Don't forget to check also the current issues to avoid duplicates.

AttributeError

Receiving the following error:

AttributeError: 'HTMLParser' object has no attribute 'unescape'

Your environment

  • Operating System (name/version): Windows V.2004
  • Python version: 3.9
  • coursera-dl version: 0.11.5

Steps to reproduce

Method:

coursera-dl regression-models

  • Is the problem happening with the latest version of the script?
  • Do you have all the recommended versions of the modules? See them in the file requirements.txt.
  • What is the course that you are trying to access?
  • What is the precise command line that you are using (don't forget to obfuscate your username and password, but leave all other information untouched).
  • What are the precise messages that you get? Please, use the --debug option before posting the messages as a bug report. Please, copy and paste them. Don't reword/paraphrase the messages.

Expected behaviour

Tell us what should happen.

Actual behaviour

C:\Users\ryan1\Documents>coursera-dl regression-models coursera_dl version 0.11.5 Downloading class: regression-models (1 / 1) Parsing syllabus of on-demand course . This may take some time, please be patient ... Processing module week-1-least-squares-and-linear-regression Processing section introduction Processing lecture welcome-to-regression-models (supplement) Traceback (most recent call last): File "c:\users\ryan1\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\ryan1\appdata\local\programs\python\python39\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\ryan1\AppData\Local\Programs\Python\Python39\Scripts\coursera-dl.exe_main.py", line 7, in File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\coursera_dl.py", line 247, in main error_occurred, completed = download_class( File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\coursera_dl.py", line 214, in download_class return download_on_demand_class(session, args, class_name) File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\coursera_dl.py", line 134, in download_on_demand_class error_occurred, modules = extractor.get_modules( File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\extractors.py", line 53, in get_modules error_occurred, modules = self._parse_on_demand_syllabus( File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\extractors.py", line 161, in _parse_on_demand_syllabus links = course.extract_links_from_supplement( File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\api.py", line 1268, in extract_links_from_supplement supplement_content, self._extract_links_from_text(value)) File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\api.py", line 1518, in _extract_links_from_text supplement_links = self._extract_links_from_a_tags_in_text(text) File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\api.py", line 1597, in _extract_links_from_a_tags_in_text extension = clean_filename( File "c:\users\ryan1\appdata\local\programs\python\python39\lib\site-packages\coursera\utils.py", line 118, in clean_filename s = h.unescape(s) AttributeError: 'HTMLParser' object has no attribute 'unescape'

rm00-git avatar Oct 08 '20 20:10 rm00-git

Hi! I was receiving the exact same error, did some research and I discovered that in python 3.9.0 HTMLParser.unescape was removed, so I switched back to python 3.8 and it is working perfectly fine. Switching to a older version that is greater than 3.4 should work. Hope it helps!

gustavoconter avatar Oct 09 '20 15:10 gustavoconter

Hi! I was receiving the exact same error, did some research and I discovered that in python 3.9.0 HTMLParser.unescape was removed, so I switched back to python 3.8 and it is working perfectly fine. Switching to a older version that is greater than 3.4 should work. Hope it helps!

Thanks, that worked!

rm00-git avatar Oct 09 '20 16:10 rm00-git

Here is a patch without needing to downgrade to python3.8 https://github.com/coursera-dl/edx-dl/pull/651/commits/5490a99a98b56f544661c131229ef640ace2b064 works in linux.

zenny avatar Oct 30 '20 16:10 zenny

Here is a patch without needing to downgrade to python3.8 coursera-dl/edx-dl@5490a99 works in linux.

I tried copying the file into Coursera folder but did help got ImportError: cannot import name 'random_string' from 'coursera.utils' error

VigneshRamanathan101 avatar Nov 20 '20 18:11 VigneshRamanathan101

@rm00-git how to go to previous version of python . Please tell me. Much needed.

manzoorHusain avatar Dec 18 '20 08:12 manzoorHusain

@rm00-git Thank you so much man. I really appretiate it.

manzoorHusain avatar Dec 18 '20 09:12 manzoorHusain

first thanks @zenny

  1. go to C:\Users\{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:

  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

ziko442 avatar Dec 22 '20 14:12 ziko442

Hi! I was receiving the exact same error, did some research and I discovered that in python 3.9.0 HTMLParser.unescape was removed, so I switched back to python 3.8 and it is working perfectly fine. Switching to a older version that is greater than 3.4 should work. Hope it helps!

Thanks, it worked!

rohitbalage avatar Jan 09 '21 06:01 rohitbalage

Here is a patch without needing to downgrade to python3.8 coursera-dl/edx-dl@5490a99 works in linux.

refer to ziko442's reply to fix the issue with coursera-dl, his reply is for edx-dl.

Nirbhay-Thacker avatar Jan 11 '21 09:01 Nirbhay-Thacker

If you modify coursera\utils.py to import html and then replace h = html_parser.HTMLParser() with h = html

3.9 Works. Suggest the dev team to implement this change.

rwilcox3 avatar Jan 30 '21 07:01 rwilcox3

If you modify coursera\utils.py to import html and then replace h = html_parser.HTMLParser() with h = html

3.9 Works. Suggest the dev team to implement this change.

It worked for me on debian buster with bullseye testing repositories.

eliottness avatar Feb 05 '21 18:02 eliottness

If you modify coursera\utils.py to import html and then replace h = html_parser.HTMLParser() with h = html

3.9 Works. Suggest the dev team to implement this change.

Hi i did it and still getting" AttributeError: 'HTMLParser' object has no attribute 'unescape'" any idea what i can do more to solve it? Thanks!

michael12987 avatar Feb 12 '21 09:02 michael12987

@adirb1 Try also commenting out this line:

from six.moves import html_parser

And you need to replace two occurrences of h = html_parser.HTMLParser() with h = html.

lifepillar avatar Mar 12 '21 16:03 lifepillar

https://github.com/coursera-dl/coursera-dl/pull/789#issuecomment-800142031

ismail709 avatar Mar 16 '21 10:03 ismail709

first thanks @zenny

  1. go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

I think there may be a problem with that change. Here, html.unescape() is only available from Python 3.4. Bellow that, it doesn't exist. So I think your code may not work on Python from 3.0 through 3.3.

You might want to change the line:

if sys.version_info[0] >= 3:

to

if sys.version_info[0] >= 3 and sys.version_info[1] >= 4:

I'm not using Python 3.3 (using 3.9), but only found your reply after having read about Python 3.4 as minimum, so thought you might want to correct the file. Thanks though. I'll go for that instead of hard-coding the new way as I was doing.

Edw590 avatar Mar 20 '21 16:03 Edw590

The software works for me now using the CAUTH flag and ziko's instructions. Thank you so much.

OS Name: Microsoft Windows 10 Enterprise Version: 10.0.19043 Build 19043 System Type: x64-based PC

first thanks @zenny

  1. go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

idrissathiam01 avatar Apr 01 '21 19:04 idrissathiam01

Pull request #789 should fix this issue...

heino avatar Apr 03 '21 08:04 heino

first thanks @zenny

  1. go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

Worked in OSX Catalina as well.

ruslaniv avatar Apr 22 '21 09:04 ruslaniv

It works for me:

apt install python3.9-dev

MAKE SURE YOU INSTALL IT FOR RIGHT PYTHON VERSION! (python3.x-dev)

If you use python 3.x install python3.x-dev, and so on

v1a0 avatar May 08 '21 12:05 v1a0

@rm00-git how to go to previous version of python . Please tell me. Much needed.

sudo update-alternatives --config python3 you should get a table with the different versions of python. select the option that has your older version of python

Vaishnavi-A27 avatar Dec 28 '21 05:12 Vaishnavi-A27

first thanks @zenny

1. go to C:\Users\{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera:
   note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:

2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

Thanks

TeymurovFuad avatar Dec 28 '21 23:12 TeymurovFuad

This issue was fixed by pull request #789 (as mentioned above),

As such, there is no reason to risk security hazards by resorting to replacing large amounts of code...

heino avatar Dec 28 '21 23:12 heino

first thanks @zenny

  1. go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

If anybody using Coursera package, just do this if you can't fix the utils.py file and it will be fix

Ali619 avatar Jan 17 '22 08:01 Ali619

first thanks @zenny

  1. go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

This is great. This works for Python 3.10.2 as well.

khatiwada1 avatar Jan 19 '22 16:01 khatiwada1

The software works for me now using the CAUTH flag and ziko's instructions. Thank you so much.

OS Name: Microsoft Windows 10 Enterprise Version: 10.0.19043 Build 19043 System Type: x64-based PC

first thanks @zenny

  1. go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

This worked on my Mac OS X 11.6.3 with Python 3.9 as of today date except the path of utils.py is:

/Users/<your_username>/opt/anaconda3/lib/python3.9/site-packages/coursera/utils.py

I modified the file as per above link and I could complete downloading the few courses which were throwing the html parser error. I also had to get the CAUTH using Safari development tools/web inspector. Not very cool to have to crawl the internet to get it working but well worth the time saved to download all courses. The only thing which are not downloaded at all are all the readings contained in an iframe.

Tcoton avatar Feb 20 '22 19:02 Tcoton

I replaced util.py with https://github.com/coursera-dl/coursera-dl/issues/778 but still not working. There is an error saying "XXXX/python3.9/site-packages/coursera/utils.py", line 118, in clean_filename s = h.unescape(s) AttributeError: 'HTMLParser' object has no attribute 'unescape'"

Not sure what went wrong. I already set h = html

shwhsx avatar May 14 '22 13:05 shwhsx

Fixed. Thanks a lot!

If you modify coursera\utils.py to import html and then replace h = html_parser.HTMLParser() with h = html

3.9 Works. Suggest the dev team to implement this change.

I edited utils.py import html comment out from six.moves import html_parser replace h = html_parser.HTMLParser() with h = html (2 position)

Then download with command coursera-dl -ca <some_cookies_value_get_from_browser> <course_name>

faea726 avatar Jun 25 '22 06:06 faea726

first thanks @zenny

  1. go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

Thanks this has worked for me

magombe avatar Jul 13 '22 16:07 magombe

first thanks @zenny

  1. go to C:\Users{ur_usr_name}\AppData\Local\Programs\Python\Python39\Lib\site-packages\coursera: note : if u install coursera-dl in a venve go to: ur-v-env-name\Lib\site-packages\coursera:
  2. => open utils.py file and replace all the code with the code in this link : https://gist.github.com/ziko442/d57d91da980e72414c725eb60878bc2d

this help me.. thanks ... Am using ubuntu 23

bethel-m avatar Jun 28 '23 14:06 bethel-m