msquic icon indicating copy to clipboard operation
msquic copied to clipboard

SetParam QUIC_PARAM_CONN_LOCAL_ADDRESS bind to undesired address

Open qzhuyan opened this issue 3 years ago • 10 comments

Describe the bug

Ater connection is established, set local addr to 127.0.0.1:50600 via API setparm Then get the local addr via API getparm. sometimes we get different port other than 50600 such as 50615

I have seen some retry mechanism (step port number) while binding port is unavailable but I think the correct the behaivor is return error instead.

Affected OS

  • [ ] Windows Server 2022
  • [ ] Windows 11
  • [ ] Windows Insider Preview (specify affected build below)
  • [X] Ubuntu
  • [ ] Debian
  • [ ] Other (specify below)

Additional OS information

No response

MsQuic version

prerelease/1.6

Steps taken to reproduce bug

  1. Client start the connection
  2. Ensure local port in use is not 50600
  3. Set the local addr to 127.0.0.1:50600 via
MsQuic->SetParam(Handle, 
                                   QUIC_PARAM_LEVEL_CONNECTION,                     
                                   QUIC_PARAM_CONN_LOCAL_ADDRESS,
                                   sizeof(QUIC_ADDR),
                                   &Address)))
  1. Check SetParam returns success
  2. Check local addr via
Level = QUIC_PARAM_LEVEL_CONNECTION;
Param = QUIC_PARAM_CONN_LOCAL_ADDRESS;
MsQuic->GetParam(Handle, Level, Param, &addrSize, &addr)))

note, Handle is a connection handler.

Expected behavior

In step 3. If binding addr is failed, it should return error instead

Actual outcome

In step 3. always return success while in step 5 sometimes we get a different addr other than 127.0.0.1:50600 such like 127.0.0.1:50615

Additional details

No response

qzhuyan avatar Sep 28 '21 07:09 qzhuyan

This code path is one of the primary ones that needs to be improved to support #1946. For the specific issue of: "Set new local address" and if it succeeds I expect "get local address" to return what I just set, not always working, we can track that here. I took a look at the code and I don't see any reason why that'd happen. Would you be able to capture some logs for that scenario?

nibanks avatar Sep 28 '21 12:09 nibanks

@nibanks thanks for the clarfication. Now I have better understanding the scope of #1946

I have enabled lttng tracing in our CI env, so we will get back with logging soon.

qzhuyan avatar Sep 28 '21 12:09 qzhuyan

For some reason, I could not use the following command to format the trace.

clog2text_lttng -i /tmp/quic.babel.txt -s /home/ezhuwya/repo/quic/msquic/src/manifest/clog.sidecar -o quic_localport_wrong.log

 at clogutils.CLogConsoleTrace.DecodeAndTraceToConsole(StreamWriter outputfile, CLogDecodedTraceLine bundle, String errorLine, CLogConfigurationFile config, Dictionary`2 valueBag, EventInformation eventInfo, Boolean showTimeStamp, Boolean showCPUInfo) in /home/vsts/work/1/s/src/clogutils/CLogConsoleTrace.cs:line 150
Invalid TraceLine : timestamp = 16:06:34.858181072, delta = +0.000001000, trace:hostname = fv-az132-490, name = CLOG_OPERATION_H:ConnExecOper, stream.packet.context = { cpu_id = 0 }, stream.event.context = { vpid = 16890, vtid = 18727 }, event.fields = { arg2 = 0x7F4FF4173580, arg3 = 1 } System.NullReferenceException: Object reference not set to an instance of an object.
   at clogutils.CLogConsoleTrace.DecodeAndTraceToConsole(StreamWriter outputfile, CLogDecodedTraceLine bundle, String errorLine, CLogConfigurationFile config, Dictionary`2 valueBag, EventInformation eventInfo, Boolean showTimeStamp, Boolean showCPUInfo) in /home/vsts/work/1/s/src/clogutils/CLogConsoleTrace.cs:line 150

I think the version of msquic in CI and on my localhost is the same, Any ideas why it doesn't work?

Anyway, I upload the babel txt zip file here: quic.babel.txt.zip

I also suspect the test case gets the local addr too early while address migration isn't completed.

I will check more tomorrow.

qzhuyan avatar Oct 02 '21 22:10 qzhuyan

@qzhuyan any updates here? I have no idea why clog would be failing. Are you using the log.ps1 script?

nibanks avatar Oct 05 '21 17:10 nibanks

No, I am not using log.ps1 since we don't have CLOG nor powershell in our CI.

I have bumped the retry till 2s which I think it should be enough to let migration complete but still could reproduce the issue. I wrote a test case explicitly occupying the port before triggering address migration and it gets address in use in return which is fine.

I am now studying CLOG to see why clog2text_lttng is failing, this is important to me for long term.

Could you convert the attachment with clog2text_lttng with clog sidecar from version 1.6?

qzhuyan avatar Oct 06 '21 13:10 qzhuyan

quic_localport_wrong.log.gz here it is.

look for

[1][428f.49bd][08:56:32.308212][conn][0x7fcd6c0e8fb0] New Local IP: 127.0.0.1:43205
[1][428f.49bd][08:56:32.308252][data][0x7fcd48282b20] Created, local=127.0.0.1:43205, remote=127.0.0.1:57873
[1][428f.49bd][08:56:32.308464][bind][0x7fcd5427a650] Created, Udp=0x7fcd48282b20 LocalAddr=127.0.0.1:43205 RemoteAddr=127.0.0.1:57873

I guess CLOG does not do network byte order translation so 43205 here is port 50600(the port number I use) I think.

qzhuyan avatar Oct 06 '21 15:10 qzhuyan

Let me double check on the clog port reversal. If its not correct, that is a bug in the clog decoder.

thhous-msft avatar Oct 06 '21 15:10 thhous-msft

https://github.com/microsoft/msquic/pull/2051

PR to fix the port reversal issue.

thhous-msft avatar Oct 06 '21 15:10 thhous-msft

@qzhuyan any updates on this issue? Since Thad fixed the port reversal issue, is there anything else this is tracking?

nibanks avatar Nov 11 '21 12:11 nibanks

#2051 only fix the port format issue in clog log I think. It did not fix the issue itself, I guess.

qzhuyan avatar Nov 21 '21 09:11 qzhuyan

@qzhuyan what's the latest on this issue? Can we close it?

nibanks avatar May 08 '23 23:05 nibanks

I will retest with latest release and reopen if I could reproduce it.

qzhuyan avatar May 11 '23 08:05 qzhuyan