tiny-glob icon indicating copy to clipboard operation
tiny-glob copied to clipboard

Iterator Support for large globs

Open rijnhard opened this issue 7 years ago • 4 comments

Hi

Consider a method where the return object could be an Iterator or AsyncIterator so that large file globs (as in huge numbers of files) are supported.

rijnhard avatar Oct 11 '18 12:10 rijnhard

Is this to avoid blocking the main thread and what would such implementation look like?

terkelg avatar Oct 20 '18 12:10 terkelg

Theres a few things involved here @terkelg

There are better implementations then what I did, this just so happened to have been fine for my use case. In my local implementations, I used fast-glob and adapted a stream into an async iterator using the stream-to-async-iterator library.

Concerns:

  • async iteration isn't standardised by TC39 yet
  • libuv doesn't quite support it yet, and thus node readdir doesn't either.
  • do we need a sync iterator? Is it beneficial or even possible? (sync iterators are already standardised)

Notes:

  • From Node 10 - anywhere where streams are supported we can support async iteration
  • streams can be adapted into async iterators

Usage:

import fglob from 'fast-glob';
import StreamIteratorAdapter from 'stream-to-async-iterator';

async function process(dirglob ,globOptions) {
    const stream = fglob.stream(dirglob, globOptions),
        iterator = new StreamIteratorAdapter(stream);

    for await (const stat of iterator) {
    	// processes items individually allowing us to handle massive glob lists without hitting resource limits
    }
}

rijnhard avatar Nov 06 '18 09:11 rijnhard

Thanks for elaborating. This seems a bit complex. Is it possible add as an extension/wrapper around tiny-glob?

terkelg avatar Nov 06 '18 09:11 terkelg

If you can provide a stream option it will allow this use case, and with time it will get more elegant. We can't do any higher level iteration if we don't have some async way of processing entries with backpressure.

Usually from the implementations I've seen this comes back to readdir, there are some packages that provide this (like fast-glob) and digging through their code it looks like they use readdir-enhanced which via some magic (I didn't look into that code) manages to provide a stream.

But to make it easier for you I'd wrap the stream and just use and expose an async iterator via generator functions, otherwise you have to do a bunch of stream handling and thats error prone and painful.

On Tue, 6 Nov 2018, 11:56 Terkel, [email protected] wrote:

Thanks for elaborating. This seems a bit complex. Is it possible add as an extension/wrapper around tiny-glob?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/terkelg/tiny-glob/issues/30#issuecomment-436195302, or mute the thread https://github.com/notifications/unsubscribe-auth/AEF36NWuATG6PdCdALz5p-qaXMmirQ1iks5usVzagaJpZM4XXdvf .

rijnhard avatar Nov 06 '18 11:11 rijnhard