Very slow on domain joined Windows machine
It took ~30 minutes to run procs on my domain joined machine with about ~225 processes running.
I profiled about 30 seconds and the hotspot is this stack:
OTHER <<advapi32!LookupAccountSidW>>
+ procs!procs::process::windows::get_name
+ procs!procs::process::windows::get_groups
+ procs!procs::process::windows::collect_proc
+ procs!procs::view::View::new
+ procs!procs::run_default
+ procs!procs::run
+ procs!procs::main
+ procs!std::rt::lang_start::
+ procs!std::rt::lang_start_internal
+ procs!std::rt::lang_start<()>
+ procs!main
+ procs!__scrt_common_main_seh
+ OTHER <<ntdll!RtlUserThreadStart>>
I found too many LookupAccountSidW call, and tried to add cache for SID at the latest master.
Could you try it?
(If you don't have Rust compiler, I can provide the latest binary.)
That's a huge improvement, but it's still too slow to be usable - about 40 seconds for 300 processes, the hot stack is basically the same.
Changing the cache to use the result of ConvertSidToStringSidW helps some more, getting closer to usable - about 9 seconds.
Many of the processes have 380 groups, so reducing the number of calls to LookupAccountSidW would help. Avoiding calling LookupAccountSidW twice (first to get the buffer sizes) is an obvious opportunity to optimize.
Not calling LookupAccountSidW at all (Gid and Groups are not default columns) also seems like a potential optimization.
I wonder if a different api might help. I noticed LsaLookupSids2 accepts an array of SIDs - I wonder if that results in fewer round-trips to the domain controller.
I found too many
LookupAccountSidWcall, and tried to add cache for SID at the latest master. Could you try it? (If you don't have Rust compiler, I can provide the latest binary.)
@dalance
I have found one more thing which can lead to additional overhead. This is where LookupAccountSidW is called twice (here): the first time to get the required size for memory, and the second time to actually do something.
The best option is to pre-allocate a large enough but static piece of memory on the stack and pass it to LookupAccountSidW. It may fail because of too small buffer (extremely low probability case), and this is where dynamic allocation can be introduced. This technique eliminates extra call to dynamic allocator, but most importantly, the LookupAccountSid function is faster when it is called once with a sufficient buffer - I saw this in profiler in one of my projects