edx-dl
edx-dl copied to clipboard
Download SSL Error with Proxy in China
🚨Please review the Troubleshooting section before reporting any issue. Don't forget also to check the current issues to avoid duplicates.
Subject of the issue
edx-video.net is banned by GFW.So we can't download it directly. If I use proxy software to open the ‘Global Proxy’ mode it will happen to SSLerror(bad request). And the requests.get() method always can't download the completed videos.
Your environment
- Operating System : Win10 64bit
- Python version: 3.6
- youtube-dl version: latest
- edx-dl version: latest
Steps to reproduce
just in Chinese net field download any lessons in normal way with '-i' . The error code will happen
Solution:
I modified edx_dl.py—>def download_url(url, filename, headers, args): like this ` def download_url(url, filename, headers, args): """ Downloads the given url in filename. """
if is_youtube_url(url):
download_youtube_url(url, filename, headers, args)
else:
import ssl
import requests
# FIXME: Ugly hack for coping with broken SSL sites:
# https://www.cs.duke.edu/~angl/papers/imc10-cloudcmp.pdf
#
# We should really ask the user if they want to stop the downloads
# or if they are OK proceeding without verification.
#
# Note that skipping verification by default could be a problem for
# people's lives if they happen to live ditatorial countries.
#
# Note: The mess with various exceptions being caught (and their
# order) is due to different behaviors in different Python versions
# (e.g., 2.7 vs. 3.4).
try:
# mitxpro fix for downloading compressed files
if 'zip' in url and 'mitxpro' in url:
urlretrieve(url, filename)
else:
proxies = { "http": "socks5://127.0.0.1:1080", "https": "socks5://127.0.0.1:1080", }
#use socks proxy, http proxy will happen to bad request.
pre_content_length=0
#use this cycle because run the request.get() 、file.write() one time will lead to un completely download.
while True:
if os.path.exists(filename):
headers['Range'] = 'bytes=%d-' % os.path.getsize(filename)
with requests.get(url, headers=headers,proxies=proxies,stream=True) as r:
r.raise_for_status()
content_length = int(r.headers['content-length'])
print(content_length)
if content_length < pre_content_length or (
os.path.exists(filename) and os.path.getsize(filename) == content_length) or content_length == 0:
break
pre_content_length = content_length
with open(filename, 'wb') as fp:
for chunk in r.iter_content(chunk_size=52428800): #52428800
if chunk: # filter out keep-alive new chunks
fp.write(chunk)
print('download success,file size : %d total size:%d' %(os.path.getsize(filename), content_length))
#total_size = int(r.headers['Content-Length'])
except Exception as e:
logging.warn('Got SSL/Connection error: %s', e)
if not args.ignore_errors:
logging.warn('Hint: if you want to ignore this error, add '
'--ignore-errors option to the command line')
raise e
else:
logging.warn('SSL/Connection error ignored: %s', e)`
This code can download videos from edx-video.net succefully.
but because of my poor code skill I can't make it easy to set proxy when excuting the command line.
Hope somebody can do it.
ref: https://juejin.im/post/5c331483e51d455246489a25 https://stackoverflow.com/questions/23645212/requests-response-iter-content-gets-incomplete-file-1024mb-instead-of-1-5gb
I find the code still sometimes can't download the whole video by requests.
用https://ping.chinaz.com/ 查找获得该域名的真实地址(延迟合理的那个),修改hosts后应该不会出现证书错误
谢谢你的回复,我改了下代码,再加上挂梯子成功地把视频下全了。
用https://ping.chinaz.com/ 查找获得该域名的真实地址(延迟合理的那个),修改hosts后应该不会出现证书错误