libaudit-go
libaudit-go copied to clipboard
The func Receive in libaudit.go costs too much performance
The func Receive in libaudit.go make a new buf every time to receive the netlink message, dose it cost too much performance? And it is easy to trigger the GC mechanism, leading to CPU fluctuations. Is it more appropriate to use a global buffer? Is it safe to do this?
Hi @ameihm0912 , @arunk-s , please help to look at this issue, we encountered the perf problem when set rate limit of audit to 2000, the cpu would cost greater than 10% on our host, and we found the major
cause is the Receive function as below, it was our understanding that seems the recvfrom netlink socket is serial called, so should we define the buf as a global buffer to improve the performance?
Can you give us any advice? thank you very much!
func (s *NetlinkConnection) Receive(nonblocking bool) ([]NetlinkMessage, error) {
var (
flags = 0
)
if nonblocking {
flags |= syscall.MSG_DONTWAIT
}
buf := make([]byte, MAX_AUDIT_MESSAGE_LENGTH+syscall.NLMSG_HDRLEN)
nr, _, err := syscall.Recvfrom(s.fd, buf, flags)
if err != nil {
return nil, err
}
return parseAuditNetlinkMessage(buf[:nr])
}
@DahuK I think it does sound reasonable to try to eliminate that allocation each time if we can.
Have a look at the branch in https://github.com/mozilla/libaudit-go/tree/recv-buffer
https://github.com/mozilla/libaudit-go/commit/f50f0488a171211e175a555ae3b5ec7d8d153b5c
This adds a new function UseReadBuffer on the netlink connection type, that can be set right before reading audit messages (see the change to auditprint.go for an example). Perhaps this will improve the performance a bit for you, but I'm not sure.
Note we can't actually just globally switch to a persistent buffer across the board right now without a few other changes. This is due to the way slices are being used to eliminate copies and new allocations wherever we can in other parts of the code.
@ameihm0912 This is very helpful to us, thank you vety much!
No problem, feel free to try it out and I'd be interested in knowing if it improves performance for you
If it does we can look at reorganizing the code a bit to make this the default operating mode
@ameihm0912 , we have merged the changes, the CPU performance have improved 2~3%, Thanks for help! p.s. Seems the change can not working on the Delete or List rules APIs, it return an unexpected EOF error on our test env.
@DahuK You might also want to look at your declared audit rules as they can also cause performance hit. See Performance Tips on https://linux.die.net/man/8/auditctl