aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

Tarfile file-like objects cannot upload as `data` attributes.

Open ReallyReivax opened this issue 3 years ago • 4 comments

Describe the bug

post and other HTTP verbs with a data attribute can accept files and file-like objects. builtins.open(), zipfile.ZipFile.open(), io.BytesIO() all transmit correctly, with the data loaded; however the tarfile.TarFile.extractfile() method, which by the python specification returns a file-like object, is treated as a file on disk, and the aiohttplibrary attempts to access the filesize by way of os.fstat by way of the fileno attribute, which does not exist on the tarfile._FileInFile (nor does it exist in the zipfile.ZipExtFile, which does in fact work with aiohttp)

zipfile.open(): https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.open tarfile.extractfile(): https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractfile

The current spec asserts that tarfile.TarFile.extractfile returns an io.BufferedReader, but there is an intermediate tarfile._FileInFile object; zip works the same way, with an intermediate zipfile.ZipExtFile object.

aiohttp specification showing expected behavior: https://docs.aiohttp.org/en/stable/client_quickstart.html#streaming-uploads

To Reproduce

Using a fully self-contained pytest, the following fails.

import aiohttp
import io
import pytest
import tarfile

@pytest.mark.asyncio
async def test_tar_filelike():
    buf = io.BytesIO()
    with tarfile.open(fileobj=buf, mode="w") as tf:
        tf.addfile(
            tarinfo=tarfile.TarInfo(name="payload1.txt", ),
            fileobj=io.StringIO("This is the first text files"),
        )

    buf.seek(0)
    async with aiohttp.ClientSession() as session:
        tf = tarfile.open(fileobj=buf, mode='r')
        for tinfo in tf.getmembers():
            tar_filelike = tf.extractfile(tinfo)
            async with session.post(url, data=tar_filelike, ) as res:
                print(await res.text())

This will work if directly invoked any other way, but the pytest and pytest-asyncio modules facilitated these tests.

Expected behavior

The following works as expected, swapping only tarfile for zipfile, and replacing the method signatures as appropriate.

import aiohttp
import io
import pytest
import tarfile

@pytest.mark.asyncio
async def test_zip_filelike():
    buf = io.BytesIO()
    with zipfile.ZipFile(file=buf, mode="w") as zf:
        with zf.open("payload1.txt", mode="w") as zip_filelike_writing:
            zip_filelike_writing.write("This is the first text file.".encode("utf-8"))

    buf.seek(0)
    async with aiohttp.ClientSession() as session:
        zf = zipfile.ZipFile(file=buf, mode='r')
        for zinfo in zf.infolist():
            zip_filelike = zf.open(zinfo)
            async with session.post(url, data=zip_filelike, ) as res:
                print(await res.text())

Logs/tracebacks

================================== FAILURES ===================================
______________________________ test_tar_filelike ______________________________

    @pytest.mark.asyncio
    async def test_tar_filelike():
        buf = io.BytesIO()
        with tarfile.open(fileobj=buf, mode="w") as tf:
            tf.addfile(
                tarinfo=tarfile.TarInfo(name="payload1.txt", ),
                fileobj=io.StringIO("This is the first text files"),
            )

        buf.seek(0)
        async with aiohttp.ClientSession() as session:
            tf = tarfile.open(fileobj=buf, mode='r')
            for tinfo in tf.getmembers():
                tar_filelike = tf.extractfile(tinfo)
>               async with session.post(url, data=tar_filelike, ) as res:

test\int\test_aiohttp.py:41:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
venv310\lib\site-packages\aiohttp\client.py:1138: in __aenter__
    self._resp = await self._coro
venv310\lib\site-packages\aiohttp\client.py:507: in _request
    req = self._request_class(
venv310\lib\site-packages\aiohttp\client_reqrep.py:313: in __init__
    self.update_body_from_data(data)
venv310\lib\site-packages\aiohttp\client_reqrep.py:519: in update_body_from_data
    size = body.size
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <aiohttp.payload.BufferedReaderPayload object at 0x000001E4A3B6D390>

    @property
    def size(self) -> Optional[int]:
        try:
>           return os.fstat(self._value.fileno()).st_size - self._value.tell()
E           AttributeError: '_FileInFile' object has no attribute 'fileno'

venv310\lib\site-packages\aiohttp\payload.py:379: AttributeError
=========================== short test summary info ===========================
FAILED test/int/test_aiohttp.py::test_tar_filelike - AttributeError: '_FileIn...
========================= 1 failed, 1 passed in 0.87s =========================

Python Version

$./venv310/Scripts/python --version
Python 3.10.4

aiohttp Version

/venv310/Scripts/python -m pip show aiohttp
Name: aiohttp
Version: 3.8.1
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author:
Author-email:
License: Apache 2
Location: .....\venv310\lib\site-packages
Requires: aiosignal, async-timeout, attrs, charset-normalizer, frozenlist, multidict, yarl
Required-by:

multidict Version

$ ./venv310/Scripts/python -m pip show multidict
Name: multidict
Version: 6.0.2
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: .....\venv310\lib\site-packages
Requires:
Required-by: aiohttp, yarl

yarl Version

$ ./venv310/Scripts/python -m pip show yarl
Name: yarl
Version: 1.7.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: .....\venv310\lib\site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

Windows 7 64 bit, but also demonstrated on Centos 7 64 bit.

Related component

Client

Additional context

This has been demonstrated to work consistently on Python 3.6 and 3.8 on Centos 7; Python 3.6, 3.7, and 3.10 on Windows 10 64 bit. A fresh install of Python 3.10, with the latest available versions of aiohttp, pytest, and pytest-asyncio were used.

As a known-working workaround, that applies to both zipfile and tarfile, using a generator to manage the data does work.

    async with aiohttp.ClientSession() as session:
        async def sender(f):
            chunk = f.read(64 * 1024)
            while chunk:
                yield chunk
                chunk = f.read(64 * 1024)

        tf = tarfile.open("./test/files/archives/files.tar", mode='r:')
        for tinfo in tf.getmembers():
            async with session.post(url, data=sender(tf.extractfile(tinfo))) as res:
                print(await res.text())

Why the actual/expected sections make use of an io.Bytes() object to create the file and hold it virtually, the behavior is exactly the same, with the same errors and traceback, with actual files on disk.

Code of Conduct

  • [x] I agree to follow the aio-libs Code of Conduct

ReallyReivax avatar May 05 '22 01:05 ReallyReivax

If you can write a test (and ideally a suggested fix) in a PR, we could look at fixing this.

Dreamsorcerer avatar May 08 '22 11:05 Dreamsorcerer

Hello!

Generally speaking, judging by the documentation, _FileInFile either must implement fileno and throw OSError. Or the BufferedReader should have a base implementation and then it should throw OSError. At the expense of the second option, there is a PR(Issue) in cpython, for 2 years already

Because of what, an error occurs in this place https://github.com/aio-libs/aiohttp/blob/master/aiohttp/payload.py#L377

WindowGenerator avatar May 11 '22 10:05 WindowGenerator

See: https://github.com/aio-libs/aiohttp/pull/6747

This is my first contribution to a major project on github, so I may be missing compliance steps. Format is run and all the tests that passed before my changes pass after my changes, plus my two or three new tests.

ReallyReivax avatar May 12 '22 01:05 ReallyReivax

I'm still interested in having this issue fixed. All tests pass, formatting matches, just needs an approval or discussion.

ReallyReivax avatar Jul 04 '22 20:07 ReallyReivax