gobgp icon indicating copy to clipboard operation
gobgp copied to clipboard

Add BufferChannel for fsm incoming/outgoing channels.

Open taitov opened this issue 1 year ago • 22 comments

Added the ability to select a type of fsm incoming/outgoing channels:

  1. InfiniteChannel (old behavior, default)
  2. BufferChannel

BufferChannel allows to accumulate and combine paths in a separate thread, in the case when the reading side (for example, BgpServer.Serve) is busy.

A map with NLRI key (bufferChannel.pathIdxs) is also used to combine multiple changes of one path into a single update.

BufferChannel prevents infinite memory consumption because its maximum size is comparable to the size of the adj-in (or adj-out for the fsm outgoing channel).

taitov avatar Jan 15 '25 13:01 taitov

cc #2862

taitov avatar Jan 15 '25 13:01 taitov

@fujita could you please take a look at it?

bayrinat avatar Jan 17 '25 07:01 bayrinat

I don't think it's a good idea for GoBGP's most core code to have two different implementations. What's the maximum buffer size of Go's buffered channel? If it's large, then we could use buffered channel instead of infinite channel like #2862 The buffered size can be specified by a command line option or such.

fujita avatar Jan 17 '25 11:01 fujita

btw, what kind of use case do you have that puts such a heavy load on GoBGP?

fujita avatar Jan 17 '25 11:01 fujita

I don't think it's a good idea for GoBGP's most core code to have two different implementations. What's the maximum buffer size of Go's buffered channel? If it's large, then we could use buffered channel instead of infinite channel like #2862 The buffered size can be specified by a command line option or such.

A regular channel with a limited size will not work, as it can still be reached, which may lead to disconnection with peer.

BufferChannel can combine multiple updates into one, discarding outdated path's attributes. The most effectiveness of this channel is shown when there is a storm of attribute updates for one path from peer.

taitov avatar Jan 17 '25 13:01 taitov

Сonfiguration example:

peer-groups:
  - config:
      peer-group-name: group1
      peer-as: 12345
      local-as: 54321
      incoming-channel-timeout: 1
      incoming-channel:
        channel-type: buffer
        size: 64
      outgoing-channel:
        channel-type: buffer
        size: 64

neighbors:
  - config:
      neighbor-address: 10.0.0.1
      peer-group: group1

or

neighbors:
  - config:
      neighbor-address: 10.0.0.1
      incoming-channel:
        channel-type: buffer
        size: 64
      outgoing-channel:
        channel-type: buffer
        size: 64
  - config:
      neighbor-address: 10.0.0.2
      incoming-channel:
        channel-type: buffer
        size: 256

taitov avatar Jan 17 '25 13:01 taitov

btw, what kind of use case do you have that puts such a heavy load on GoBGP?

Paths are being cyclically updated from peers due to attribute updates. And main thread BgpServer.Serve() a busy (ex, processes the mgmtOperation or propagateUpdate paths when peer went disconnect). In this scenario, InfiniteChannel begins to fill up quickly.

taitov avatar Jan 17 '25 13:01 taitov

@fujita gentle ping

taitov avatar Jan 28 '25 18:01 taitov

@fujita gentle ping

taitov avatar Feb 19 '25 08:02 taitov

Why just not removing incoming queues?

fujita avatar Feb 24 '25 04:02 fujita

Why just not removing incoming queues?

  • Then peers can start flap (hold timer expired).
  • OOM on remote peer (overflow outgoing fsm channel).

taitov avatar Feb 24 '25 05:02 taitov

Why just not removing incoming queues?

  • Then peers can start flap (hold timer expired).
  • OOM on remote peer (overflow outgoing fsm channel).

But with this PR - сan add another type of fsm incoming/outgoing channels, which will disable queues.

taitov avatar Feb 24 '25 05:02 taitov

I meant that why not removing incoming channels?

fujita avatar Feb 24 '25 06:02 fujita

I meant that why not removing incoming channels?

I understand you :) And my answer: if we remove incoming channels, we will get possible problems: peers flap and OOM on remote gobgp server.

taitov avatar Feb 24 '25 08:02 taitov

Hmm, why? goroutine, reads from a socket, parses a BGP message, and updates the table with a lock and PrefixLimit config. I don't see the difference about OOM.

fujita avatar Feb 24 '25 09:02 fujita

Hey! Do you know where we are regarding this PR? Couldn't we do it in two steps, first solve inbound channels and afterward the others?

mpeim avatar Apr 10 '25 09:04 mpeim

Test result with 10k updates per second:

Green line: memory usage. Blue line: CPU usage.

InfiniteChannel (old behavior) image

BufferChannel image

InfiniteChannel used 4GB memory vs. BufferChannel 16MB on same BGP updates pattern.

taitov avatar May 13 '25 11:05 taitov

Nice graphs! Really clear and informative.

Is there a way to reproduce a similar load in my environment (like MRT or something)?

fujita avatar May 16 '25 00:05 fujita

Is there a way to reproduce a similar load in my environment (like MRT or something)?

I think the easiest way would be to create this scheme:

       Peer1                               Peer2
┌─────────────────┐                 ┌─────────────────┐
│      GOBGP      │   100k routes   │  ModifiedGOBGP  │
│ Infinite/Buffer │◄────────────────┤                 │
│     Channel     │   10M updates   │                 │
└─────────────────┘                 └────────▲────────┘
                                             │         
                                             │         
                                      Add 100k routes  
                                       to global RIB   

Where ModifiedGOBGP is https://github.com/osrg/gobgp/commit/f42a6fec4346b5015c75872fe05028fb45ec595a

taitov avatar Jun 15 '25 21:06 taitov

@fujita hey, is there any details that you need? This PR fixes unnecessary OOMs, it would be nice to have it in upstream.

bayrinat avatar Jun 24 '25 12:06 bayrinat

https://github.com/osrg/gobgp/pull/2993 I plan to remove incoming channels completely.

fujita avatar Jun 30 '25 09:06 fujita

Incoming channel was removed completely. It would be great if you could also check the memory usage with the same test.

fujita avatar Jul 16 '25 23:07 fujita