selenium icon indicating copy to clipboard operation
selenium copied to clipboard

[πŸ› Bug]: Fail to download PDF or zip file from remote to client on Remote webdriver

Open 15975518086 opened this issue 1 year ago β€’ 6 comments

What happened?

error:

D:\Python\Python311\python.exe D:/OfflineaCare/ndb/program/test/test_oooooooo.py Traceback (most recent call last): File "D:\OfflineaCare\ndb\program\test\test_oooooooo.py", line 51, in driver.download_file(downloadable_file, target_directory) File "D:\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1155, in download_file zip_ref.extractall(target_directory) File "D:\Python\Python311\Lib\zipfile.py", line 1679, in extractall self._extract_member(zipinfo, path, pwd) File "D:\Python\Python311\Lib\zipfile.py", line 1734, in _extract_member shutil.copyfileobj(source, target) File "D:\Python\Python311\Lib\shutil.py", line 197, in copyfileobj buf = fsrc_read(length) ^^^^^^^^^^^^^^^^^ File "D:\Python\Python311\Lib\zipfile.py", line 953, in read data = self._read1(n) ^^^^^^^^^^^^^^ File "D:\Python\Python311\Lib\zipfile.py", line 1021, in _read1 data += self._read2(n - len(data)) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Python\Python311\Lib\zipfile.py", line 1056, in _read2 raise EOFError EOFError

Process finished with exit code 1

How can we reproduce the issue?

The code bellow is click the button,then download the .docx file(or zip or pdf)
code:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

options = webdriver.ChromeOptions()
options.enable_downloads = True
driver = webdriver.Remote(command_executor='http://192.168.3.35:4444/wd/hub', options=options)
driver.maximize_window()
driver.implicitly_wait(5)
driver.get("http://127.0.0.1:8000/login_page")
driver.find_element(By.XPATH,"//button[text()='ε―Όε‡Ί']").click()
time.sleep(5)
file_names = driver.get_downloadable_files()
downloadable_file = file_names[0]
target_directory = r'D:\dtmp'
driver.download_file(downloadable_file, target_directory)
time.sleep(10)


node setting:
java -jar selenium-server-4.20.0.jar node --hub http://192.168.3.35:4444   --host 192.168.3.35 --port 5557  --enable-managed-downloads true



I found the the source code in webdriver.py the method :def get_downloadable_files, has some issues
if i set the name to be zip like 'file_name = 'package.zip' ,then i can run successfully, but without this ,it will fail


        contents = self.execute(Command.DOWNLOAD_FILE, {"name": file_name})["value"]["contents"]
        # file_name = 'package.zip'
        target_file = os.path.join(target_directory, file_name)
        with open(target_file, "wb") as file:
            file.write(base64.b64decode(contents))

        with zipfile.ZipFile(target_file, "r") as zip_ref:
            zip_ref.extractall(target_directory)

Relevant log output

D:\Python\Python311\python.exe D:/OfflineaCare/ndb/program/test/test_oooooooo.py
Traceback (most recent call last):
  File "D:\OfflineaCare\ndb\program\test\test_oooooooo.py", line 51, in <module>
    driver.download_file(downloadable_file, target_directory)
  File "D:\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1155, in download_file
    zip_ref.extractall(target_directory)
  File "D:\Python\Python311\Lib\zipfile.py", line 1679, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "D:\Python\Python311\Lib\zipfile.py", line 1734, in _extract_member
    shutil.copyfileobj(source, target)
  File "D:\Python\Python311\Lib\shutil.py", line 197, in copyfileobj
    buf = fsrc_read(length)
          ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 953, in read
    data = self._read1(n)
           ^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 1021, in _read1
    data += self._read2(n - len(data))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 1056, in _read2
    raise EOFError
EOFError

Process finished with exit code 1

Operating System

WINDOWS10

Selenium version

selenium 4.20.0 python 3.11.3

What are the browser(s) and version(s) where you see this issue?

Chrome 124

What are the browser driver(s) and version(s) where you see this issue?

124.0.6367.61

Are you using Selenium Grid?

selenium-server-4.20.0.jar

15975518086 avatar May 17 '24 10:05 15975518086

@15975518086, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

github-actions[bot] avatar May 17 '24 10:05 github-actions[bot]

Hi!

I encountered the same problem when trying to download a zip file.

Also in the process of debugging I catch another error message here (maybe it help: image

Operating System: Manjaro Linux Selenium version: 4.21 Python version: 3.12 Browsers: Chrome , Firefox, Edge (latest versions of selenium/standalone)

Traceback:

tests/modules/test_internal_export.py:104: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py:1155: in download_file
   zip_ref.extractall(target_directory)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1720: in extractall
   self._extract_member(zipinfo, path, pwd)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1778: in _extract_member
   shutil.copyfileobj(source, target)
../../../.pyenv/versions/3.12.0/lib/python3.12/shutil.py:203: in copyfileobj
   while buf := fsrc_read(length):
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:978: in read
   data = self._read1(n)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1046: in _read1
   data += self._read2(n - len(data))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <zipfile.ZipExtFile [closed]>, n = 3094

   def _read2(self, n):
       if self._compress_left <= 0:
           return b''
   
       n = max(n, self.MIN_READ_SIZE)
       n = min(n, self._compress_left)
   
       data = self._fileobj.read(n)
       self._compress_left -= len(data)
       if not data:
>           raise EOFError
E           EOFError

../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1081: EOFError

Docker-compose file

version: '3'

services:
 chrome:
   image: selenium/standalone-chrome
   shm_size: 2gb
   ports:
     - 4444:4444  # Selenium service
     - 5900:5900  # VNC server
     - 7900:7900  # VNC browser client
   environment:
     - SE_OPTS=--enable-managed-downloads true

M1troll avatar May 27 '24 05:05 M1troll

We are also experiencing the same issue... The root issue, is that it's writing the zip-file content with the same name of the desired file, when it starts to uncompress, the "zip" file get's overwritten and then the file goes empty resulting with the EOF exception

ATM we are bypassing it by calling the self.execute directly with a similar solution to what millin did in his PR

    def __download_file(self, file_name: str, target_directory: str) -> None:
        if not os.path.exists(target_directory):
            os.makedirs(target_directory)

        contents = self.execute(Command.DOWNLOAD_FILE, {"name": file_name})["value"]["contents"]

        zip_target_file = os.path.join(target_directory, f"{file_name}.zip")
        with open(zip_target_file, "wb") as file:
            file.write(base64.b64decode(contents))

        with zipfile.ZipFile(zip_target_file, "r") as zip_ref:
            zip_ref.extractall(target_directory)
        os.remove(zip_target_file)

mormamn avatar Jun 02 '24 12:06 mormamn

This issue is looking for contributors.

Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested.

github-actions[bot] avatar Jul 12 '24 08:07 github-actions[bot]

@titusfortner Fixed in #14031

millin avatar Sep 19 '24 06:09 millin

I believe this issue can be closed as the PR for this is merged already like @millin said.

Delta456 avatar Oct 08 '24 08:10 Delta456