jwalk icon indicating copy to clipboard operation
jwalk copied to clipboard

Using RayonExistingPool for parallelism in WalkDir

Open kunjann opened this issue 3 years ago • 1 comments

As I understand using RayonExistingPool will use the threads from the thread pool to perform the actual dir walk. When I run the following snippet I am seeing threadId's in WalkDir being different from the ones created by custom pool. Is this expected? I am using jwalk 0.6.0.

let mut pool = std::sync::Arc::new(
        ThreadPoolBuilder::new()
        .start_handler(|_| {
            println!("Thread started: {:?}", std::thread::current().id());
        }).num_threads(2).build().unwrap()
    );

    let count = WalkDir::new("/tmp/test")
        .parallelism(Parallelism::RayonExistingPool(pool))
        .into_iter()
        .par_bridge()
        .inspect(|result| println!("{:?} {:?}", std::thread::current().id(), result.as_ref().unwrap()))
        .count();

kunjann avatar Apr 05 '22 20:04 kunjann

@kunjann par_bridge distributes iter items across rayon's thread pool. Even though jwalk executes on your custom pool, the resulting items are yielded back to the caller's thread. And then you are sending them back to rayon's default thread pool with par_bridge for whatever reason. If you want to know the thread id that jwalk executes on, you can log it inside process_read_dir.

Boscop avatar Dec 26 '22 19:12 Boscop