jwalk
jwalk copied to clipboard
Using RayonExistingPool for parallelism in WalkDir
As I understand using RayonExistingPool will use the threads from the thread pool to perform the actual dir walk.
When I run the following snippet I am seeing threadId's in WalkDir being different from the ones created by custom pool.
Is this expected? I am using jwalk 0.6.0.
let mut pool = std::sync::Arc::new(
ThreadPoolBuilder::new()
.start_handler(|_| {
println!("Thread started: {:?}", std::thread::current().id());
}).num_threads(2).build().unwrap()
);
let count = WalkDir::new("/tmp/test")
.parallelism(Parallelism::RayonExistingPool(pool))
.into_iter()
.par_bridge()
.inspect(|result| println!("{:?} {:?}", std::thread::current().id(), result.as_ref().unwrap()))
.count();
@kunjann par_bridge distributes iter items across rayon's thread pool. Even though jwalk executes on your custom pool, the resulting items are yielded back to the caller's thread. And then you are sending them back to rayon's default thread pool with par_bridge for whatever reason.
If you want to know the thread id that jwalk executes on, you can log it inside process_read_dir.