pixie
pixie copied to clipboard
Modify `dns_data` table to record DNS requests with no response
Problem: Sometimes DNS fails without the pod receiving DNS errors (e.g. if the network packets carrying the DNS response are being dropped).
Solution: Change the dns_data
table to record DNS requests without responses, so that you can track the number of DNS requests that remain unanswered over time.
This is really cool project! Can I help with this?
/assign
@htroisi Could you assign this issue to me ? I'm a newbie and it seems like a good issue to start with.
@htroisi can you please provide some additional information about this issue and assign it to me ?
@noman-xg we're excited that you're interested in contributing to Pixie!
@oazizi000 @yzhao1012 how challenging would it be to modify the dns_table
to store requests that don’t receive a response? Do you think this change would be useful overall?
@htroisi any updates on the status of this issue ? Here's what i know till now.
I have looked through the workflow in Stirling/source_connectors/socket_tracers/protocols/dns and the contribution guides for new protocol implementation.
According to my understanding we already have a variable which is keeping count of the DNS requests for which we find no response (error_count) in ...dns/stitcher.cc.
What's our goal here, do we need to populate the dns_data table with such records or just the count of packets dropped in given time frame will suffice ?
@noman-xg - you are on the correct path! I spoke with @yzhao1012 about this and he thinks it would be a very straightforward implementation to output the request frames (with missing responses) that are erased at line 231.
However, before you make this change I would like @oazizi000 (the original author of this code) to confirm that he thinks this would be a useful feature.
@noman-xg - I spoke to a few people offline and it sounds like everyone is in agreement that it would be useful if you could make this change! Couple of notes:
- Take a look at the
amqp_data
table's implementation. Apparently this table implements this asynchronous request / response behavior in theStitchFrames
function. - @oazizi000 requests that you make the behavior configurable. I'm guessing we'd want to make this a flag that can be enabled at deployment, similar to
ENABLE_AMQP_TRACING
.
@htroisi please assign this issue to me I've started working on it.
@noman-xg were you able to get the dev environment running? If not, let us know and we can help unblock you.
Hey @htroisi . Yeah, actually i was able to set that up and successfully test the changes. I'll be creating an initial PR for preview sometime tomorrow or the day after.
@oazizi000 @htroisi I have created the pull request related to this issue. Looking forward for code reviews.
This feature was added in #613. Thanks for the contribution @noman-xg!