filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

make `fsspec.asyn._get_batch_size()` public

Open pmrowla opened this issue 1 year ago • 5 comments

Being able to get the configured or system default batch size from fsspec is useful when using filesystems that support an additional level of concurrency (beyond the existing fs methods that support file batching like get()/put()).

See: adlfs, which supports setting concurrency for chunked/multipart uploads and downloads (and in this case concurrency is handled by sending a batch size to the underlying Azure SDK, scheduling the individual chunk upload/download in adlfs with _run_coros_in_chunks() is not an option).

_get_batch_size() is currently marked as protected (with the leading _) and internally only gets called in asyn._run_coros_in_chunks(), but it would be better if fsspec exposed a public method for getting the configured batch size

pmrowla avatar Aug 04 '23 02:08 pmrowla

I suspect that there was simply no use case for this before to make it public. Probably can just contribute a PR.

efiop avatar Aug 04 '23 12:08 efiop

Agree with @efiop

I will comment that fsspec is an unusual library, because it has a definite "end user" API of public functions/methods (as in the docs) and a whole load of other stuff that is used by other libraries or implementations that build on fsspec. In addition, the async methods of an async implementation also start with "_" to show they are special, but not hidden/private. Perhaps a better convention could have been chosen.

martindurant avatar Aug 04 '23 12:08 martindurant

In addition, the async methods of an async implementation also start with "_" to show they are special, but not hidden/private. Perhaps a better convention could have been chosen.

I'm aware of this convention for the sync vs async AbstractFileSystem implementation methods (and it's documented that it works this way).

But fsspec.asyn also contains a mix of module level methods that look intentionally public (like fsspec.asyn.get_loop()) and ones that don't (like fsspec.asyn._get_batch_size()) which is why the concerns about using _get_batch_size() were raised in the adlfs PR

pmrowla avatar Aug 04 '23 16:08 pmrowla

If it's safe to assume _get_batch_size() is relatively stable and is safe for fs implementations to use, this issue can probably just be closed.

pmrowla avatar Aug 08 '23 03:08 pmrowla

The best way to make sure that it is safe is to write a test against it and add docstrings/comments to the code itself. I would say it is not planned for any changes, in its existence or the signature, however.

martindurant avatar Aug 09 '23 14:08 martindurant