go-sip-ua icon indicating copy to clipboard operation
go-sip-ua copied to clipboard

Dialogs/sessions completely broken because of wrong sessionkey

Open gaaf opened this issue 3 years ago • 16 comments

UA builds a session key from the Call-ID and the branch-id. This is not how a dialog id should be constructed. The branch-id is transaction-specific and has no place in a session (dialog).

This makes all in-dialog requests to fail to match the session.

As a dialog is identified solely by Call-ID, local-tag and remote-tag (rfc3261) please use these to build the session key. This of course requires knowledge of the remote tag, which is only available after a response >100 is received. In absence of forking support, an intermediary solution would be to only use Call-ID and local tag for the dialog id.

The breakage seems to have been introduced by #73.

gaaf avatar Feb 15 '22 15:02 gaaf

you're right, #73 caused a break, need a new PR to remove this #73 effect,

cloudwebrtc avatar Feb 16 '22 06:02 cloudwebrtc

You'd have to do much more than just "fixing the #73 effect". When I read the code, I was thinking: "Is this a joke?" This UA has obviously been developed by someone who has not read or understood the relevant basic RFCs (RFC 3261, for a start). Here are just some of the problems I believe the code has (I haven't run / tested anything, so I may be wrong with some):

  • A dialog is identified by from-tag, to-tag, call-id. There can be several early-dialogs for a UAC which have to be tracked. Once a dialog is confirmed the other dialogs need to be BYEd.
  • A session's main reason for existence is to a) group belonging transactions b) establish a route-set. go-sip-ua doesn't do the latter and I'm unsure about the former.
  • The route-set is not evaluated or stored anywhere, nor is it used for in-dialog transactions
  • I don't see that the to-tag is stored anywhere (but in the response) once a dialog is confirmed. I think for a UAC in-dialog transactions will have the same (non-in-dialog) tags as the initial invite.
  • Cseq is not handled correctly. All (ua generated) in-dialog transactions get the same cseq (+1 of initial invite)
  • NewInviteSession() adds a to tag even for UAC, were it should instead add a from tag (but doesn't?). Additionally the tag is not guaranteed to be unique in time&space as mandated by the RFC. Use a UUID.

And this isn't even going into the finer details. E.g. the difference between an ACK to a 2xx vs > 299 response for session establishing responses. This actually looks fine, though, as the sip stack handles that and contrary to the UA the stack was developed by someone who did understand SIP. I'm still puzzled how this code can be used for anything at all.

Since I want/need a SIP UA in Go, I may be going to fork this and fix all of the problems, but this would probably be a huge rewrite. Fork (instead of PR) as the maintainer should be someone who has an understanding of the basics of SIP.

skeller avatar Feb 18 '22 13:02 skeller

You're right that it deviates from the rfcs in a lot of places. But imho there's no need for harsh words.

I'm still evaluating the libs and as part of that I have already addressed most of the issues you mention. I'll submit PR's when I have finished. Probably early next week.

Forking is always an option if cooperation fails.

gaaf avatar Feb 18 '22 13:02 gaaf

@skeller Thank you for speaking out on these issues, I can add you @skeller @gaaf to collaborators if you want (if you are willing to share your experience and fix these bugs, which is what an open-source project is for), I have to admit that this repository is not perfect, and it takes a lot of time to write an open-source project (read RFCs, write unit tests), the current project code quality only represents my current level and experience, but I will learn and improve it in the project.

cloudwebrtc avatar Feb 18 '22 13:02 cloudwebrtc

This repo was originally written to do some interactive tests with webrtc, but it seems that a lot of people are paying attention, so I try to improve it during the learning process, for example, offline call push, webrtc2sip gateway, and a simple b2bua

cloudwebrtc avatar Feb 18 '22 13:02 cloudwebrtc

I've invited you as collaborators @skeller @gaaf , if you want, you can submit changes directly to fix these bugs :)

cloudwebrtc avatar Feb 18 '22 13:02 cloudwebrtc

I think the primary focus should be to build a decent ua and dialog layer on top of the gosip (or any other) stack. Applications like you mentioned only come atop those.

gaaf avatar Feb 18 '22 13:02 gaaf

@cloudwebrtc, @gaaf: I didn't mean to be harsh. Apologies if I was. Yes, the UA should be a dialog-layer on top of gosip's transaction layer. Applications are then handled by e.g. the InviteStateHandler i.e. on top of UA. My idea to fix those issues (I just started):

  • Add Cseq, local tag, remote tag, route set to Session
  • Get route set from record-routes on 2xx for UAC, Invite for UAS
  • Increment & use Session.Cseq for in-dialog TX
  • Local tag is always the tag we generated (e.g. via google.com/uuid uuid.NewString(); placement in from for UAS, to for UAC (here I mean the RFC's meaning og UAC/UAS, i.e. whoever creates / receives the TX, not who first created a dialog)
  • Local tag is the key in the session map
  • (Maybe, for early-dialogs: session map is a slice or map of sessions w/ same local tag, different or empty remote tag)

I'm not sure how far @gaaf is, I haven't even started with coding yet, just (statically) analyzed / read what is there, then came here to see if these issues are known / already being addressed.

skeller avatar Feb 18 '22 14:02 skeller

  • Add Cseq, local tag, remote tag, route set to Session
  • Get route set from record-routes on 2xx for UAC, Invite for UAS
  • Increment & use Session.Cseq for in-dialog TX
  • Local tag is always the tag we generated

Above I have already done

(e.g. via google.com/uuid uuid.NewString();

Lets please keep uuid optional (allow injecting an ID generator?), I despise it. It has low entropy and uses a lot of space. I'd like to keep SIP packets as small as possible. That includes having all "random" stuff as short as possible. Had enough trouble already with broken routers not forwarding fragmented SIP.

placement in from for UAS, to for UAC (here I mean the RFC's meaning og UAC/UAS, i.e. whoever creates / receives the TX, not who first created a dialog)

The UAC/UAS naming in the current code is very confusing. UAC/UAS role can change during a dialog. I almost eliminated all code referencing those names.

  • Local tag is the key in the session map

Compound key is easy in Go. Just use Call-ID + from-tag.

  • (Maybe, for early-dialogs: session map is a slice or map of sessions w/ same local tag, different or empty remote tag)

Keep in mind the use-case of the PR causing this issue ticket. Accepting forked incoming calls is requested. This can (easily) be accomplished by adding an optional 3rd component to the dialog ID. This can just be a counter. The value must be added to the contact-uri as a parameter so in-dialog requests can be distinguished.

gaaf avatar Feb 18 '22 14:02 gaaf

I've pushed my work so far into my repo.

Took a bit more time than expected, so no progress on forking support yet.

gaaf avatar Feb 18 '22 15:02 gaaf

  • (Maybe, for early-dialogs: session map is a slice or map of sessions w/ same local tag, different or empty remote tag)

Keep in mind the use-case of the PR causing this issue ticket. Accepting forked incoming calls is requested. This can (easily) be accomplished by adding an optional 3rd component to the dialog ID. This can just be a counter. The value must be added to the contact-uri as a parameter so in-dialog requests can be distinguished.

Yes, I meant the other direction. We create a dialog towards a SIP proxy, which then forks, creating several early-dialogs (with different to-tags). But you're right, forked inbound invites must also be handled.

skeller avatar Feb 18 '22 18:02 skeller

I've pushed my work so far into my repo.

Took a bit more time than expected, so no progress on forking support yet.

Which repo do you mean? Is it this?

I tried to get go-sip-ua to work and modified the session map by using call-id and from-tag as key, but it doesn't work yet. Did you make any progress with the session map?

gungnix avatar Mar 07 '22 11:03 gungnix

That is the correct repo, yes. Th dev branch.

There's a patch in it which just disables the branch-id in the dialog-id. I had no time to implement a proper dialog-id.

Unfortunately, prio's have changed recently, so I won't have time in the foreseeable future to work on this. I still think most of the commits are useful and should be merged.

gaaf avatar Mar 07 '22 11:03 gaaf

I tried to get go-sip-ua to work and modified the session map by using call-id and from-tag as key, but it doesn't work yet. Did you make any progress with the session map?

You shouldn't always use the from-tag, but the local tag. The local-tag can be the from-tag or to-tag, depending in who is sending/receiving.

gaaf avatar Mar 07 '22 18:03 gaaf

@gaaf Thanks for your commits. I used your code and added session handling with local tag to it. This should work for inbound forked INVITEs as a new local tag is generated, so the session are independent. The changes also track the branch of the initial INVITE, though, in order to match CANCELs. My code is here (session branch). Multiple early-dialogs are not tracked yet. Race handling (I think) also didn't work before, i.e. multiple early-dialogs being responded with 200. The ua should pick the first and BYE all others. This requires tracking the early-dialogs in the first place. Also SDP isn't handled correctly. Especially late offer-answer. I believe the "Session" should not try to guess or track whether something is an offer or an answer. There should only be remote SDP and local SDP. It would be the job of some SDP implementation / interface to handle offer/answer.

skeller avatar Mar 23 '22 12:03 skeller

There should only be remote SDP and local SDP. It would be the job of some SDP implementation / interface to handle offer/answer.

I started fixing this, but had to stop. The needed information (whether the SDP is received or generated) is stripped very early on and the SDP handling code only gets the SDP itself. Fixing this seems to require a big effort/rewrite to correct the layering.

gaaf avatar Apr 04 '22 10:04 gaaf