Improve SIP message parsers
This PR refactors and optimises the SIP message parsers.
Short list of changes:
- Parser will now track the number of bytes consumed, removing the need for hacks in streaming mode.
- Steaming parser now properly skips CRLF between messages (RFC 3261 - 7.5).
-
Content-Lengthis now required for streaming mode (RFC 3261 - 7.5). - Parser postpones the
[]byte -> stringconversions for headers, improving performance by 4% and reducing allocations by 10%. - Streaming parser reuses more code from the regular parser.
-
Parserexposes new API for parsing the message headers without the body. - Streaming parser exposes API for getting individual messages and how many bytes were read.
- Streaming parser can now discard data to recover the connection state after malformed messages.
Benchmarking results
4% faster parsing, up to 10% less allocations.
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
ParserStream/NoChunks-16 2.500µ ± 3% 2.377µ ± 4% -4.94% (p=0.004 n=10)
ParserStream/SingleRoutine-16 2.612µ ± 2% 2.518µ ± 1% -3.62% (p=0.000 n=10)
ParserStream/Paralel-16 13.18µ ± 6% 13.33µ ± 6% ~ (p=0.912 n=10)
Parser/SingleRoutine-16 2.843µ ± 4% 2.785µ ± 1% -2.04% (p=0.000 n=10)
Parser/Paralel-16 14.21µ ± 5% 14.10µ ± 5% ~ (p=0.239 n=10)
ParserNoData/New-16 2.120µ ± 1% 2.014µ ± 3% -5.00% (p=0.000 n=10)
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
ParserStream/NoChunks-16 2.790Ki ± 0% 2.656Ki ± 0% -4.80% (p=0.000 n=10)
ParserStream/SingleRoutine-16 2.845Ki ± 0% 2.656Ki ± 0% -6.63% (p=0.000 n=10)
ParserStream/Paralel-16 3.021Ki ± 0% 2.661Ki ± 0% -11.90% (p=0.000 n=10)
Parser/SingleRoutine-16 3.117Ki ± 0% 2.984Ki ± 0% -4.26% (p=0.000 n=10)
Parser/Paralel-16 3.123Ki ± 0% 2.989Ki ± 0% -4.28% (p=0.000 n=10)
ParserNoData/New-16 2.520Ki ± 0% 2.430Ki ± 0% -3.57% (p=0.000 n=10)
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
ParserStream/NoChunks-16 39.00 ± 0% 37.00 ± 0% -5.13% (p=0.000 n=10)
ParserStream/SingleRoutine-16 41.00 ± 0% 37.00 ± 0% -9.76% (p=0.000 n=10)
ParserStream/Paralel-16 41.00 ± 0% 37.00 ± 0% -9.76% (p=0.000 n=10)
Parser/SingleRoutine-16 44.00 ± 0% 43.00 ± 0% -2.27% (p=0.000 n=10)
Parser/Paralel-16 44.00 ± 0% 43.00 ± 0% -2.27% (p=0.000 n=10)
ParserNoData/New-16 32.00 ± 0% 30.00 ± 0% -6.25% (p=0.000 n=10)
Changes to Parser
Original parsing method is kept unchanged:
func (p *Parser) ParseSIP(data []byte) (Message, error)
Instead, a new method is introduced:
func (p *Parser) Parse(data []byte, stream bool) (Message, int, error)
This method now returns number of bytes used to parse the message. It always stops at the beginning of the line that caused a failure, which helps reuse the same underlying code for streaming mode.
The stream flag adjust the behaviour of the parser slightly, for example, in streaming mode the Content-Length is required and CRLF at the beginning of the message is silently skipped (RFC 3261 - 7.5).
This new method now always returns io.UnexpectedEOF if message was not read completely and more data is required. The old ParseSIP is not affected and still returns ParseEOF error.
Additionally, another new method is added for parsing the message without the body:
func (p *Parser) ParseHeaders(data []byte, stream bool) (Message, int, error)
This allows using the parser more efficiently in proxies that only act on the headers. It will not require allocating the body separately, instead, the proxy may write the headers followed by the body from the original buffer.
Changes to ParserStream
Existing methods for ParserStream are kept unchanged:
func (p *ParserStream) ParseSIPStream(data []byte) ([]Message, error)
func (p *ParserStream) ParseSIPStreamEach(data []byte, cb func(msg Message)) error
New method was added for reading messages separately:
func (p *ParserStream) Write(data []byte) (int, error)
func (p *ParserStream) ParseNext() (Message, int, error)
After using Write to append data to the internal buffer, ParseNext can be called multiple times to get SIP messages. As opposed to ParseSIPStream and ParseSIPStreamEach, the caller can decide when to stop reading the messages.
Under the hood, all ParserStream methods now reuse the methods for parsing headers and start line from the Parser, reducing code duplication and differences in behaviour.
Similar to the new Parse method, ParseNext returns the number of bytes that were used to read the message, or io.UnexpectedEOF in case the message was parsed only partially. Old methods like ParseSIPStreamEach still return ErrParseSipPartial instead.
Additionally, there are a few new methods to control the stream state:
func (p *ParserStream) Buffer() *bytes.Buffer
func (p *ParserStream) Discard(n int)
func (p *ParserStream) Reset()
func (p *ParserStream) Close()
Buffer allows the caller to examine the underlying parser state after an error. ParseNext will return the offset into this buffer at the beginning of the failed line.
Discard allows the caller to drop N first bytes and reset the parser to recover the remaining stream. This API assumes that the caller has some heuristic to decide how and when to recover.
Reset completely resets the internal state and the buffer, allowing reuse of the ParserStream.
Close is similar, but calling it is now required to reuse the underlying buffer for other parsers (using the Pool). Previously, the parser was always discarding the buffer after each message. Now the caller can control it by calling Reset or Close.
Like a bomb :), just we are almost at 1.0.0 and I am now not sure where to go with this. I like the performance improvements and all and would like to merge but...
So do I understand you need some exposure here of Parser because you are running some manual parsing? I would like to see is there need to expose this right now all, and give chance for merging prio 1.0.0
I have to look more, but here some first looks. This I think could be better as 2 seperate functions. Reason is that caller has to know this upfront, and error handling will anyway be determined by flag, but it can be underhood left the rest.
func (p *Parser) Parse(data []byte, stream bool) (Message, int, error)
Also btw reason for ParseSIP and not Parse was historifcal as we had interface usage :)
First, thank you for a quick reply!
So do I understand you need some exposure here of Parser because you are running some manual parsing?
That's correct. We now have a separate SIP proxy that needs the parser, but not the other layers. And we've hit a few issues with it. This is why we'd appreciate a bit more flexibility for the parser, which this PR attempts.
I would like to see is there need to expose this right now all, and give chance for merging prio 1.0.0
No rush at all! We are totally fine with running this code from our own branch for now, until you are ready to upstream these changes in one form or another (1.1.0 maybe?).
This I think could be better as 2 seperate functions.
You mean having two separate parsers? PR still keeps ParseSIP method intact, so it doe use two different function for old vs new signatures and error semantics. Under the hood it's the same code.
@dennwc if you can pls rebase, but if you find some werid conflicts let me know. I had some maybe renaming of stuff. I want to consider merging this soon.
@emiago rebase complete!
Hey @emiago! Is anything else blocking this? I do not want to rush it, just let me know if there's anything else remaining here.
hey @dennwc it is on my list. Holidays and my focus was else where. I think I will find time this week. ;)