pixie icon indicating copy to clipboard operation
pixie copied to clipboard

Modify `dns_data` table to record DNS requests with no response

Open htroisi opened this issue 2 years ago • 5 comments

Problem: Sometimes DNS fails without the pod receiving DNS errors (e.g. if the network packets carrying the DNS response are being dropped).

Solution: Change the dns_data table to record DNS requests without responses, so that you can track the number of DNS requests that remain unanswered over time.

htroisi avatar Mar 30 '22 20:03 htroisi

This is really cool project! Can I help with this?

zhyon404 avatar Jul 29 '22 08:07 zhyon404

/assign

zhyon404 avatar Jul 29 '22 08:07 zhyon404

@htroisi Could you assign this issue to me ? I'm a newbie and it seems like a good issue to start with.

noman-xg avatar Sep 12 '22 11:09 noman-xg

@htroisi can you please provide some additional information about this issue and assign it to me ?

noman-xg avatar Sep 14 '22 08:09 noman-xg

@noman-xg we're excited that you're interested in contributing to Pixie!

@oazizi000 @yzhao1012 how challenging would it be to modify the dns_table to store requests that don’t receive a response? Do you think this change would be useful overall?

htroisi avatar Sep 19 '22 16:09 htroisi

@htroisi any updates on the status of this issue ? Here's what i know till now.

I have looked through the workflow in Stirling/source_connectors/socket_tracers/protocols/dns and the contribution guides for new protocol implementation.

According to my understanding we already have a variable which is keeping count of the DNS requests for which we find no response (error_count) in ...dns/stitcher.cc.

What's our goal here, do we need to populate the dns_data table with such records or just the count of packets dropped in given time frame will suffice ?

noman-xg avatar Sep 23 '22 14:09 noman-xg

@noman-xg - you are on the correct path! I spoke with @yzhao1012 about this and he thinks it would be a very straightforward implementation to output the request frames (with missing responses) that are erased at line 231.

However, before you make this change I would like @oazizi000 (the original author of this code) to confirm that he thinks this would be a useful feature.

htroisi avatar Sep 26 '22 17:09 htroisi

@noman-xg - I spoke to a few people offline and it sounds like everyone is in agreement that it would be useful if you could make this change! Couple of notes:

  • Take a look at the amqp_data table's implementation. Apparently this table implements this asynchronous request / response behavior in the StitchFrames function.
  • @oazizi000 requests that you make the behavior configurable. I'm guessing we'd want to make this a flag that can be enabled at deployment, similar to ENABLE_AMQP_TRACING.

htroisi avatar Sep 27 '22 16:09 htroisi

@htroisi please assign this issue to me I've started working on it.

noman-xg avatar Sep 29 '22 08:09 noman-xg

@noman-xg were you able to get the dev environment running? If not, let us know and we can help unblock you.

htroisi avatar Oct 07 '22 17:10 htroisi

Hey @htroisi . Yeah, actually i was able to set that up and successfully test the changes. I'll be creating an initial PR for preview sometime tomorrow or the day after.

noman-xg avatar Oct 07 '22 17:10 noman-xg

@oazizi000 @htroisi I have created the pull request related to this issue. Looking forward for code reviews.

noman-xg avatar Oct 10 '22 12:10 noman-xg

This feature was added in #613. Thanks for the contribution @noman-xg!

htroisi avatar Oct 20 '22 18:10 htroisi