airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Update azure-storage-blob version

Open eladkal opened this issue 3 years ago • 16 comments

We restricted version in https://github.com/apache/airflow/pull/18443 due to error in static checks https://github.com/apache/airflow/pull/17068#issuecomment-925264410 Tests pass locally with the updated version


^ Add meaningful description above

Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

eladkal avatar Jul 31 '22 09:07 eladkal

Any reason versions between 12.9 and 12.13 need to be excluded?

uranusjr avatar Aug 01 '22 03:08 uranusjr

Any reason versions between 12.9 and 12.13 need to be excluded?

I'll check if I can relax the version

eladkal avatar Aug 02 '22 09:08 eladkal

mmm We have

airflow/providers/microsoft/azure/hooks/wasb.py:302: error: Argument 1 to
"write" of "BufferedWriter" has incompatible type "Union[bytes, str]"; expected
"Union[bytes, Union[bytearray, memoryview, array[Any], mmap, _CData]]" 

which seems to be related to https://github.com/Azure/azure-sdk-for-python/issues/24661 and will be fixed only in 12.14.0b1 so we can not relax the version

eladkal avatar Aug 03 '22 17:08 eladkal

As for the second static failure it's kinda strange. mypy fails with:

airflow/providers/microsoft/azure/hooks/wasb.py:232: error: Argument
"delimiter" to "walk_blobs" of "ContainerClient" has incompatible type
"Optional[str]"; expected "str"  [arg-type]
    ...me_starts_with=prefix, include=include, delimiter=delimiter, **kwargs)

Code line: https://github.com/apache/airflow/blob/f18c609d127f54fbbf4dae6b290c6cdcfc7f98d0/airflow/providers/microsoft/azure/hooks/wasb.py#L216

but the types did not change between the versions: https://github.com/Azure/azure-sdk-for-python/blob/azure-storage-blob_12.8.1/sdk/storage/azure-storage-blob/azure/storage/blob/_container_client.py#L779

https://github.com/Azure/azure-sdk-for-python/blob/azure-storage-blob_12.13.0/sdk/storage/azure-storage-blob/azure/storage/blob/_container_client.py#L780

In 12.8.1 (when mypy pass) it’s the same as in 12.13.0 which mypy fails!

eladkal avatar Aug 03 '22 17:08 eladkal

As for the second static failure it's kinda strange. mypy fails with:

airflow/providers/microsoft/azure/hooks/wasb.py:232: error: Argument
"delimiter" to "walk_blobs" of "ContainerClient" has incompatible type
"Optional[str]"; expected "str"  [arg-type]
    ...me_starts_with=prefix, include=include, delimiter=delimiter, **kwargs)

Code line:

https://github.com/apache/airflow/blob/f18c609d127f54fbbf4dae6b290c6cdcfc7f98d0/airflow/providers/microsoft/azure/hooks/wasb.py#L216

but the types did not change between the versions: https://github.com/Azure/azure-sdk-for-python/blob/azure-storage-blob_12.8.1/sdk/storage/azure-storage-blob/azure/storage/blob/_container_client.py#L779

https://github.com/Azure/azure-sdk-for-python/blob/azure-storage-blob_12.13.0/sdk/storage/azure-storage-blob/azure/storage/blob/_container_client.py#L780

In 12.8.1 (when mypy pass) it’s the same as in 12.13.0 which mypy fails!

The delimiter typing should really be delimiter: str = '/', though. I think we've been getting away with this in mypy for a while 🙂

josh-fell avatar Aug 03 '22 18:08 josh-fell

The delimiter typing should really be delimiter: str = '/', though. I think we've been getting away with this in mypy for a while 🙂

Yes this is what i did as it also required param in the upstream lib but I'm not sure I get why mypy behaves differently on the same function signature and typing? a mystery.

eladkal avatar Aug 03 '22 18:08 eladkal

OK delimiter is solved. So now :

  1. Test failures are not related to this PR. It will be fixed in https://github.com/apache/airflow/pull/25511 - I'll rebase after it's merged.
  2. write function typing issue has been fixed upstream we must wait to 12.14.0

after that we should get a green build

eladkal avatar Aug 03 '22 18:08 eladkal

Merged . Rebasing this one on top :)

potiuk avatar Aug 03 '22 19:08 potiuk

Just one static-check left!

potiuk avatar Aug 04 '22 07:08 potiuk

Just one static-check left!

Yes. This is the one that we must wait to 12.14.0 The issue already fixed upstream but they didn't release it yet. I wrote to them about it: https://github.com/Azure/azure-sdk-for-python/issues/24661#issuecomment-1204300000

Once 12.14.0 I'll modify the PR to use that version as minimum

eladkal avatar Aug 04 '22 07:08 eladkal

Still failing with

Run mypy for providers.................................................................Failed
- hook id: run-mypy
- exit code: 1

airflow/providers/microsoft/azure/hooks/wasb.py:305: error: Incompatible types
in assignment (expression has type "Union[bytes, str]", variable has type
"bytes")  [assignment]
                content: bytes = stream.readall()

Lets see if https://github.com/apache/airflow/pull/25426/commits/bb1c54579887613cbc73592f21fea6bdd1ea6788 resolves it

eladkal avatar Aug 05 '22 07:08 eladkal

I think Mypy is too smart and can tell that open(..., "wb") does not accept str, so that would also fail. This is probably the solution with the least overhead

content = stream.readall()
assert isinstance(content, bytes)
fileblob.write(content)

uranusjr avatar Aug 05 '22 09:08 uranusjr

So it solved the static issue but created problems for the unit tests. I think the best option here is just to wait for upstream to release 12.14.0 should be in a month+-

eladkal avatar Aug 06 '22 08:08 eladkal

So it solved the static issue but created problems for the unit tests.

I think the best option here is just to wait for upstream to release 12.14.0 should be in a month+-

Maybe if the assertion was under TYPE_CHECKING scope the unit test wouldn't fail? Just an idea since the assert statement is to hopefully make mypy happy.

josh-fell avatar Aug 06 '22 11:08 josh-fell

Maybe if the assertion was under TYPE_CHECKING scope the unit test wouldn't fail? Just an idea since the assert statement is to hopefully make mypy happy.

I think not really - the idea is to make it consistent, I think :). It's going to change when the fixed version of liubrary is out, so it will change then and someone might rely on it being bytes.

potiuk avatar Aug 06 '22 11:08 potiuk

Yeah let’s wait for upstream.

uranusjr avatar Aug 08 '22 04:08 uranusjr

Testing with 12.14.0b2

Run mypy for providers.................................................................Failed
- hook id: run-mypy
- exit code: 1

airflow/providers/microsoft/azure/hooks/wasb.py:364: error: Argument "offset"
to "download_blob" of "BlobClient" has incompatible type "Optional[int]";
expected "int"  [arg-type]
            return blob_client.download_blob(offset=offset, length=length,...
                                                    ^
airflow/providers/microsoft/azure/hooks/wasb.py:364: error: Argument "length"
to "download_blob" of "BlobClient" has incompatible type "Optional[int]";
expected "int"  [arg-type]
    ...turn blob_client.download_blob(offset=offset, length=length, **kwargs)
                                                            ^
Found 2 errors in 1 file (checked 951 source files)

Looks like https://github.com/Azure/azure-sdk-for-python/commit/8c243548334e2d58bce77caa35ede6a792b75a93 changed also type hints for another function.

eladkal avatar Aug 31 '22 16:08 eladkal

Testing with 12.14.0b2

Run mypy for providers.................................................................Failed
- hook id: run-mypy
- exit code: 1

airflow/providers/microsoft/azure/hooks/wasb.py:364: error: Argument "offset"
to "download_blob" of "BlobClient" has incompatible type "Optional[int]";
expected "int"  [arg-type]
            return blob_client.download_blob(offset=offset, length=length,...
                                                    ^
airflow/providers/microsoft/azure/hooks/wasb.py:364: error: Argument "length"
to "download_blob" of "BlobClient" has incompatible type "Optional[int]";
expected "int"  [arg-type]
    ...turn blob_client.download_blob(offset=offset, length=length, **kwargs)
                                                            ^
Found 2 errors in 1 file (checked 951 source files)

Looks like Azure/azure-sdk-for-python@8c24354 changed also type hints for another function.

def download_blob(
            self, offset: int = None,
            length: int = None,
...

^ I'm by no means a mypy expert, but the typing for offset and length are incorrect?

josh-fell avatar Aug 31 '22 16:08 josh-fell

New release of azure-storage-blob is expected next week: https://github.com/Azure/azure-sdk-for-python/issues/24661#issuecomment-1268952696 then this PR will be green :)

eladkal avatar Oct 06 '22 15:10 eladkal

azure-storage-blob 12.14.0 was released so this PR should be ready

eladkal avatar Oct 13 '22 10:10 eladkal

This change broke the wasb log task handler.

The download method now requires offset and length arguments, but these are not provided from the task handler code. Also, how would it provide anything else than "read the whole file".

malthe avatar Dec 08 '22 13:12 malthe

This change broke the wasb log task handler.

The download method now requires offset and length arguments, but these are not provided from the task handler code. Also, how would it provide anything else than "read the whole file".

I think you shoudl open an issue (or better PR :) for that.

potiuk avatar Dec 08 '22 13:12 potiuk