msquic
msquic copied to clipboard
SetParam QUIC_PARAM_CONN_LOCAL_ADDRESS bind to undesired address
Describe the bug
Ater connection is established, set local addr to 127.0.0.1:50600
via API setparm
Then get the local addr via API getparm.
sometimes we get different port other than 50600
such as 50615
I have seen some retry mechanism (step port number) while binding port is unavailable but I think the correct the behaivor is return error instead.
Affected OS
- [ ] Windows Server 2022
- [ ] Windows 11
- [ ] Windows Insider Preview (specify affected build below)
- [X] Ubuntu
- [ ] Debian
- [ ] Other (specify below)
Additional OS information
No response
MsQuic version
prerelease/1.6
Steps taken to reproduce bug
- Client start the connection
- Ensure local port in use is not
50600
- Set the local addr to
127.0.0.1:50600
via
MsQuic->SetParam(Handle,
QUIC_PARAM_LEVEL_CONNECTION,
QUIC_PARAM_CONN_LOCAL_ADDRESS,
sizeof(QUIC_ADDR),
&Address)))
- Check SetParam returns success
- Check local addr via
Level = QUIC_PARAM_LEVEL_CONNECTION;
Param = QUIC_PARAM_CONN_LOCAL_ADDRESS;
MsQuic->GetParam(Handle, Level, Param, &addrSize, &addr)))
note, Handle
is a connection handler.
Expected behavior
In step 3. If binding addr is failed, it should return error instead
Actual outcome
In step 3. always return success while in step 5 sometimes we get a different addr other than 127.0.0.1:50600
such like 127.0.0.1:50615
Additional details
No response
This code path is one of the primary ones that needs to be improved to support #1946. For the specific issue of: "Set new local address" and if it succeeds I expect "get local address" to return what I just set, not always working, we can track that here. I took a look at the code and I don't see any reason why that'd happen. Would you be able to capture some logs for that scenario?
@nibanks thanks for the clarfication. Now I have better understanding the scope of #1946
I have enabled lttng tracing in our CI env, so we will get back with logging soon.
For some reason, I could not use the following command to format the trace.
clog2text_lttng -i /tmp/quic.babel.txt -s /home/ezhuwya/repo/quic/msquic/src/manifest/clog.sidecar -o quic_localport_wrong.log
at clogutils.CLogConsoleTrace.DecodeAndTraceToConsole(StreamWriter outputfile, CLogDecodedTraceLine bundle, String errorLine, CLogConfigurationFile config, Dictionary`2 valueBag, EventInformation eventInfo, Boolean showTimeStamp, Boolean showCPUInfo) in /home/vsts/work/1/s/src/clogutils/CLogConsoleTrace.cs:line 150
Invalid TraceLine : timestamp = 16:06:34.858181072, delta = +0.000001000, trace:hostname = fv-az132-490, name = CLOG_OPERATION_H:ConnExecOper, stream.packet.context = { cpu_id = 0 }, stream.event.context = { vpid = 16890, vtid = 18727 }, event.fields = { arg2 = 0x7F4FF4173580, arg3 = 1 } System.NullReferenceException: Object reference not set to an instance of an object.
at clogutils.CLogConsoleTrace.DecodeAndTraceToConsole(StreamWriter outputfile, CLogDecodedTraceLine bundle, String errorLine, CLogConfigurationFile config, Dictionary`2 valueBag, EventInformation eventInfo, Boolean showTimeStamp, Boolean showCPUInfo) in /home/vsts/work/1/s/src/clogutils/CLogConsoleTrace.cs:line 150
I think the version of msquic in CI and on my localhost is the same, Any ideas why it doesn't work?
Anyway, I upload the babel txt zip file here: quic.babel.txt.zip
I also suspect the test case gets the local addr too early while address migration isn't completed.
I will check more tomorrow.
@qzhuyan any updates here? I have no idea why clog would be failing. Are you using the log.ps1
script?
No, I am not using log.ps1
since we don't have CLOG nor powershell in our CI.
I have bumped the retry till 2s which I think it should be enough to let migration complete but still could reproduce the issue.
I wrote a test case explicitly occupying the port before triggering address migration and it gets address in use
in return which is fine.
I am now studying CLOG to see why clog2text_lttng
is failing, this is important to me for long term.
Could you convert the attachment with clog2text_lttng
with clog sidecar from version 1.6
?
quic_localport_wrong.log.gz here it is.
look for
[1][428f.49bd][08:56:32.308212][conn][0x7fcd6c0e8fb0] New Local IP: 127.0.0.1:43205
[1][428f.49bd][08:56:32.308252][data][0x7fcd48282b20] Created, local=127.0.0.1:43205, remote=127.0.0.1:57873
[1][428f.49bd][08:56:32.308464][bind][0x7fcd5427a650] Created, Udp=0x7fcd48282b20 LocalAddr=127.0.0.1:43205 RemoteAddr=127.0.0.1:57873
I guess CLOG does not do network byte order translation so 43205
here is port 50600
(the port number I use) I think.
Let me double check on the clog port reversal. If its not correct, that is a bug in the clog decoder.
https://github.com/microsoft/msquic/pull/2051
PR to fix the port reversal issue.
@qzhuyan any updates on this issue? Since Thad fixed the port reversal issue, is there anything else this is tracking?
#2051 only fix the port format issue in clog log I think. It did not fix the issue itself, I guess.
@qzhuyan what's the latest on this issue? Can we close it?
I will retest with latest release and reopen if I could reproduce it.