lnd icon indicating copy to clipboard operation
lnd copied to clipboard

chainfee: introduce filterManager and use it for fee floor

Open Crypt-iQ opened this issue 1 year ago • 26 comments

This commit introduces a filterManager that uses our outbound peers' feefilter values to determine an acceptable minimum feerate that ensures successful transaction propagation. Under the hood, a moving median is used as it is more resistant to shocks than a moving average.

Mostly looking for an approach ACK on:

  • the choice of a moving median compared to a moving average
  • the choice of exclusively using outbound peers' feefilter to avoid manipulation by an inbound attacker. The downside is that we get less datapoints
  • the choice of constants (polling interval and the moving window) which will affect the overall result
    • a quicker polling interval means more up-to-date data but at the cost of taking up the window size with potentially stale data
    • a larger moving window would give the moving median more "memory", but may end up carrying irrelevant data

Crypt-iQ avatar Jan 24 '24 18:01 Crypt-iQ

[!IMPORTANT]

Auto Review Skipped

Auto reviews are limited to the following labels: llm-review. Please add one of these labels to enable auto reviews.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai[bot] avatar Jan 24 '24 18:01 coderabbitai[bot]

tho we could make this configurable, that we can use percentiles.

+1 for configurability and use of percentiles. Much better control IMO, if we have a degree of confidence that at x rate y% of peers will accept the transaction and then y can be a configurable value which user can play with, with a default of 70% e.g.

saubyk avatar Feb 01 '24 18:02 saubyk

Just realize GetPeerInfo is already implemented here: https://github.com/btcsuite/btcd/blob/master/rpcclient/net.go#L354

Since this is a feefilter msg sent by our peers and we just cache it, I don't think calling GetPeerInfo periodically is necessary as 1) it's a not sending new requests to peers, so no new info gained 2) we are just reading a cache, so no concern performance-wise. At least that's how btcd handles it.

That being said, I don't think we are sending MsgFeeFilter to our peers in btcd? cc @Roasbeef

yyforyongyu avatar Feb 02 '24 07:02 yyforyongyu

Much better control IMO, if we have a degree of confidence that at x rate y% of peers will accept the transaction and then y can be a configurable value which user can play with, with a default of 70% e.g.

If we do this, we need a ceiling because o/w if a user sets this to 100% their LND node will set this to a high value and they will lose their money to fees.

So with that in mind, knowing that we use moving median/average to reduce time based jitter, what are we smoothing?

This was laziness on my part, I didn't feel like adding more values to the moving window.

As such I'm not sure we need to use a moving median to control for it as any jitter that shows up in the time-series will inherently be jitter in the median node's report and thus more likely already reflect broader market conditions.

That's true, the only benefit here would be to smooth out any shocks. But as you say if there are shocks it reflects what the broader network is doing and we should probably react to them.

The issue is that just because they are outbound connections doesn't mean that they won't be malicious.

It is much harder for a malicious entity to be the majority of our outbound connections than it is for them to be the majority of our inbound connections (trivial). Assuming the user is running bitcoind, they would need to pollute our addrman table and hope that the next time we start up that their nodes are chosen for outbound connection. It's not impossible, but if we're going to use data from our peers it should be the outbound peers. This is another reason we should try to enforce sensible bounds on the data we receive.

It does prevent drive-by's but simply knowing this someone could serve a diff fee filter for outbound connections vs inbound ones.

For this to have any effect, a lot of nodes would have to be running this patch to serve different outbound feefilters. This would result in transaction relay failing for a lot of txn's. Also this type of coordination would probably not go unnoticed. For example, I have a script that would detect this sort of broader network behavior. Again this isn't impossible for somebody to pull off, but we should have sensible limits in place to mitigate any risk here. Deciding on what those will be might be tricky.

However, let's suppose that it does, do we lose anything other than data points? I don't think so.

I don't follow, can you clarify?

So I think we can fix a lower sample count by just instructing bitcoind to have more outbound peers using addnode and the like.

We could do this, bitcoind will let us do 8 addnode connections (https://github.com/bitcoin/bitcoin/blob/5b8c5970bdfc817cac9b59f699925c4426c59b61/src/net.h#L1488). However, I think lnd instructing bitcoind how to behave is unprecedented for our codebase.

Alternatively if that's too much effort we can always take up to the same number of samples and prefer the ones that are outbound over the ones that are inbound.

I don't think we should ever use inbound data

A quicker polling rate is probably better and we can either extend the window or time-bucketing to make it such that older data is at a lower resolution. That said, like I said in the above commentary, I don't think we need to get cute with the windowing calcs because I expect the important info to come from cohort aggregation.

Yup I agree.

Since this is a feefilter msg sent by our peers and we just cache it, I don't think calling GetPeerInfo periodically is necessary as 1) it's a not sending new requests to peers, so no new info gained 2) we are just reading a cache, so no concern performance-wise. At least that's how btcd handles it.

Our bitcoind peers will periodically send us feefilters (https://github.com/bitcoin/bitcoin/blob/5b8c5970bdfc817cac9b59f699925c4426c59b61/src/net_processing.cpp#L5463-L5472) so it makes sense to poll. Btcd doesn't send out a feefilter, but the majority of the network is bitcoind so we don't need to worry about it unless we plan to implement it on the btcd side.

Crypt-iQ avatar Feb 02 '24 16:02 Crypt-iQ

I don't think we need polling for minfeefilter, just like we don't poll for mempoolminfee. By the time we broadcast the tx, two scenarios may exist,

  • that the polled data is up-to-date and we can use it to guide us to send the tx.
  • that the polled data is outdated, making it less useful to guide us.

If we don't do polling we won't have this issue. As of today, publishing a tx already involves two RPC calls,

  1. call testmempoolaccept
  2. call sendrawtransaction

We might as well call getpeerinfo after testmempoolaccept and process the fresh data.

Re inbound and outbound peers, I find this and this posts helpful. In short, I agree with @Crypt-iQ that we should only use outbound peers.

yyforyongyu avatar Feb 02 '24 16:02 yyforyongyu

I don't follow, can you clarify?

I was agreeing with you in that, I don't think there is a real negative consequence other than just smaller sample size if we go with outbound-only data.

However, I think lnd instructing bitcoind how to behave is unprecedented for our codebase.

Hmm yeah I agree that maybe this is a route we'd like to avoid. I do think if we are restricting to outbound only, we need to ensure we always have enough of those connections. Do we know bitcoind's behavior here?

I don't think we should ever use inbound data

ACK

ProofOfKeags avatar Feb 02 '24 20:02 ProofOfKeags

On second thought I think keeping the moving window is good because if we suddenly lose the majority of our outbound peers and only have one or two peers 1000 sats/KvB (about 10% of the network will return this), we'll potentially be unable to broadcast our transactions.

Do we know bitcoind's behavior here?

It attempts to keep 8 outbound connections by default. IIRC this limit can be raised

Crypt-iQ avatar Feb 05 '24 15:02 Crypt-iQ

I don't think we need polling for minfeefilter, just like we don't poll for mempoolminfee. By the time we broadcast the tx, two scenarios may exist,

* that the polled data is up-to-date and we can use it to guide us to send the tx.

* that the polled data is outdated, making it less useful to guide us.

If we don't do polling we won't have this issue. As of today, publishing a tx already involves two RPC calls,

1. call `testmempoolaccept`

2. call `sendrawtransaction`

We might as well call getpeerinfo after testmempoolaccept and process the fresh data.

If we don't adjust our min relay feerate ahead of time (i.e. before we need to go onchain), we won't have a commitment transaction that can actually relay.

I think we really need to do polling so that we can do update_fee negotiation long before we ever call PublishTransaction.

morehouse avatar Feb 05 '24 16:02 morehouse

Mostly looking for an approach ACK on:

* the choice of a moving median compared to a moving average

ACK.

I don't think the configurable percentile idea should be implemented right now. Let's keep things simple and reduce configuration until we have a strong reason to do that.

Median sounds like a good default -- 50% of our outbound peers are expected to relay our min fee transactions.

Moving median seems good to smooth out spikiness caused by peer disconnects.

* the choice of exclusively using outbound peers' feefilter to avoid manipulation by an inbound attacker. The downside is that we get less datapoints

I agree outbound peers are preferred.

Perhaps we should set a minimum number of data points we want (e.g., 3?). If we can't get that many data points from outbound connections, we consider some inbound connections to make up the difference.

* the choice of constants (polling interval and the moving window) which will affect the overall result
  
  * a quicker polling interval means more up-to-date data but at the cost of taking up the window size with potentially stale data
  * a larger moving window would give the moving median more "memory", but may end up carrying irrelevant data

5 minute polling interval seems reasonable to me.

I wonder if we should reduce the size of the moving window though. A size of 25 would mean that we use the median relay feerate over the past ~13 blocks. If feerates happen to be consistently increasing, we could miss HTLC deadlines because our moving window was too slow to update.

morehouse avatar Feb 05 '24 17:02 morehouse

If we don't adjust our min relay feerate ahead of time

Every time we create transactions, when estimating fees, we already call EstimateFeePerKW to get the latest min relay fee.

On second thought I think keeping the moving window is good because if we suddenly lose the majority of our outbound peers and only have one or two peers 1000 sats/KvB (about 10% of the network will return this), we'll potentially be unable to broadcast our transactions.

Could you elaborate a bit? If we only have two peers with 1000 sats/kvB, doesn't it mean the polling data won't be useful because the only peers we have won't accept our transactions?

It attempts to keep 8 outbound connections by default. IIRC this limit can be raised

The max number of outbound connections is independently limited by 10.

yyforyongyu avatar Feb 05 '24 17:02 yyforyongyu

Every time we create transactions, when estimating fees, we already call EstimateFeePerKW to get the latest min relay fee.

If we instantaneously query for feefilters rather than poll, we run into an issue if we only have one or two peers at the time of the query.

Could you elaborate a bit? If we only have two peers with 1000 sats/kvB, doesn't it mean the polling data won't be useful because the only peers we have won't accept our transactions?

So this is what the moving window would help us with, it would smooth out jitter caused by some of our outbound peers disconnecting. If our two remaining outbound peers advertise 1000 sats/KvB rather than the network majority of ~20,000 sats/KvB, we'll create txn's >= 1000 sats/KvB instead of ~ >= 20,000 sats/KvB since we won't have any memory of when we had 8 outbound peers.

The max number of outbound connections is independently limited by 10.

Two of these are block-relay-only and won't send us feefilter values.

Perhaps we should set a minimum number of data points we want (e.g., 3?). If we can't get that many data points from outbound connections, we consider some inbound connections to make up the difference.

I'm wary about this because it could open the door to some malicious behavior. It might be better to just have few data points rather than some inbound peers making us pay too much or too little in fees. bitcoind does actively try to maintain 8 outbound connections so I would expect that we'd have 8 outbounds most of the time.

I wonder if we should reduce the size of the moving window though. A size of 25 would mean that we use the median relay feerate over the past ~13 blocks. If feerates happen to be consistently increasing, we could miss HTLC deadlines because our moving window was too slow to update.

Yeah 13 blocks might be too much. What block value do you (or anybody else) have in mind?

Crypt-iQ avatar Feb 05 '24 18:02 Crypt-iQ

If our two remaining outbound peers advertise 1000 sats/KvB rather than the network majority of ~20,000 sats/KvB

Cool I was thinking the opposite, that the network majoirty is 500 sats/kvB, and we have two peers of 1000 sats/kvB, by polling we'd have the memory of 500 sats/kvB, which gives us a feerate that won't be accepted by our peers.

Under the same assumed condition, that we only have two peers, if we don't poll, it means in the worst case we still have 2 peers relaying the tx for us. If we poll, the worst case is we cannot get our tx relayed.

I think we should intead focusing on maintaining X outbound peers - maybe add an outbound peers check in healthcheck.

yyforyongyu avatar Feb 05 '24 18:02 yyforyongyu

Perhaps we should set a minimum number of data points we want (e.g., 3?). If we can't get that many data points from outbound connections, we consider some inbound connections to make up the difference.

I'm wary about this because it could open the door to some malicious behavior. It might be better to just have few data points rather than some inbound peers making us pay too much or too little in fees. bitcoind does actively try to maintain 8 outbound connections so I would expect that we'd have 8 outbounds most of the time.

If we have 8 outbound peers all the time, then either approach behaves the same. But we should think about corner cases too. What do we do if there's 0 outbound peers? Or what if there's only 1 and its feefilter is an outlier (say 20x higher than actual min relay fee)?

Yeah 13 blocks might be too much. What block value do you (or anybody else) have in mind?

3 blocks?

morehouse avatar Feb 05 '24 18:02 morehouse

If we have 8 outbound peers all the time, then either approach behaves the same.

I don't think it's the same - it's more design hence more complicated. Plus when 8 peers all increase their feefilter in a short time, polling won't work as we don't have the fresh data.

I also don't think the number of outbound connections is something we can or should control. We can only inform users, and suggest them to take care their bitcoind.

yyforyongyu avatar Feb 05 '24 18:02 yyforyongyu

I wonder if we should reduce the size of the moving window though. A size of 25 would mean that we use the median relay feerate over the past ~13 blocks. If feerates happen to be consistently increasing, we could miss HTLC deadlines because our moving window was too slow to update.

The other option would be to use an EMA instead of an SMA here, this will cause us to respond more aggressively to recent history which will increase jitter but still respond to a dynamic environment. If the primary jitter we are trying to smooth is coming from peer churn, then I think an EMA would be sufficient as it would still have some smoothing properties, but any MA will lag the current "correct" reading (by definition). So in an increasing fee environment no matter what [X]MA we go with we will have to cope with the fact that we'll be behind the trend.

ProofOfKeags avatar Feb 05 '24 19:02 ProofOfKeags

If we have 8 outbound peers all the time, then either approach behaves the same. But we should think about corner cases too. What do we do if there's 0 outbound peers? Or what if there's only 1 and its feefilter is an outlier (say 20x higher than actual min relay fee)?

We could skip a round if we don't have enough data points, but not sure of all the implications

Crypt-iQ avatar Feb 05 '24 19:02 Crypt-iQ

We could skip a round if we don't have enough data points, but not sure of all the implications

If we skip the round we should interpolate the data, solving for finding the reading we would need to keep the output value constant. This is so that we don't distort the measurement for future readings.

ProofOfKeags avatar Feb 05 '24 19:02 ProofOfKeags

I don't think it's the same - it's more design hence more complicated. Plus when 8 peers all increase their feefilter in a short time, polling won't work as we don't have the fresh data.

Good point. It's important not to undershoot the relay fee for commitment transactions. A moving window will undershoot during rising fees, while a on-demand polling strategy may undershoot if peers have disconnected right before we do the poll.

Instead of a moving average, how about a max-of-medians in the lookback window? That would mean that the latest data point is used in a rising fee environment, while older data is used for smoothing when fees are falling or oscillating.

Another consideration is that we may want to set the commitment fee rate at some % above the min relay fee, to provide some protection if relay fees rise soon after we craft the commitment transaction.

morehouse avatar Feb 05 '24 23:02 morehouse

IMO it's at most 8 points of data and I don't think we should over design here. I feel the only thing we should fix is to make sure the majority of the outbound peers can relay our txns, and let users handle maintaining X amount of peers themselves since it's out of our control? We also shouldn't care about how our peers relay the txns, hence the network condition here - that's the job for mempoolminfee?

yyforyongyu avatar Feb 06 '24 05:02 yyforyongyu

Instead of a moving average, how about a max-of-medians in the lookback window? That would mean that the latest data point is used in a rising fee environment, while older data is used for smoothing when fees are falling or oscillating.

This seems fine if we always have 8 data points and skip when we don't. We don't want to choose a really high max if we had one peer doing something funky.

Another consideration is that we may want to set the commitment fee rate at some % above the min relay fee, to provide some protection if relay fees rise soon after we craft the commitment transaction.

I like this better than choosing a max of medians. Even with this and a moving median, we could lag behind, but it would probably be good enough. Right now we have something that is not even close to good enough (no awareness of feefilter).

IMO it's at most 8 points of data and I don't think we should over design here.

We have to make sure that we're secure against malicious behavior otherwise we might as well not have this logic at all.

I feel the only thing we should fix is to make sure the majority of the outbound peers can relay our txns

This is kind of what a median aims to do except with even data points it does half rather than the majority. The moving window can get in the way of relaying to the majority of our outbound, but it's also for safety in case we don't have enough outbounds to make a decision on feefilter values.

To help aid the discussion and hopefully not muddy it further, it might help to talk about real data. I've noticed that there's always this group of nodes with feefilter set to ~1000 sats/KvB regardless of what the majority of the network is doing (i.e. at, say, ~20,000 sats/KvB). In this snapshot from this morning, there were 1,151 nodes with feefilters < 5000 sats/KvB out of 8,441 nodes queried. The majority of the network is > 9597 sats/KvB which roughly matches mempool.space. feefilter-0206

If we assume this sample is representative of the whole network (I've taken measurements for the past week and the low-feefilter bucket seems constant), then ~13.635% of the time we'll choose one of these low-feefilter nodes as an outbound peer. Choosing 4 low-feefilter outbound peers out of 8 is: (8 choose 4) * (0.13635)^4 = 0.024194 or 2.4%. So 2.4% of nodes running this patch will have junk data. With that in mind, I think it's important to note that the filterManager should be best-effort and is used as a fee floor during fee estimation after querying the backend.

I think we just need something that:

  • performs better than not having this logic
  • protection against abnormally high outbound feefilters
    • this could be a fee ceiling or dynamic based on some relevant value

If anybody wants to play around with the crawler:

  • code to run the crawler: https://github.com/Crypt-iQ/crawler
  • code to plot the data: https://gist.github.com/Crypt-iQ/3ca970f0a4c15ee15457c9578f78228e

Crypt-iQ avatar Feb 06 '24 16:02 Crypt-iQ

This seems fine if we always have 8 data points and skip when we don't. We don't want to choose a really high max if we had one peer doing something funky.

Right, if we have enough data points the medians should filter out the funky stuff. I think we would probably be okay if there's a bit less than 8 data points too -- maybe we could only skip below ~6.

I like this better than choosing a max of medians. Even with this and a moving median, we could lag behind, but it would probably be good enough. Right now we have something that is not even close to good enough (no awareness of feefilter).

Sounds fine to me as a first implementation. If we still have relay problems in practice, we can switch to max-of-medians with a tiny patch.

If we assume this sample is representative of the whole network (I've taken measurements for the past week and the low-feefilter bucket seems constant), then ~13.635% of the time we'll choose one of these low-feefilter nodes as an outbound peer. Choosing 4 low-feefilter outbound peers out of 8 is: (8 choose 4) * (0.13635)^4 = 0.024194 or 2.4%. So 2.4% of nodes running this patch will have junk data. With that in mind, I think it's important to note that the filterManager should be best-effort and is used as a fee floor during fee estimation after querying the backend.

I think we just need something that:

* performs better than not having this logic

* protection against abnormally high outbound feefilters
  
  * this could be a fee ceiling or dynamic based on some relevant value

I think I agree -- we should get something in place that improves the status quo. Later we can tweak as necessary.

morehouse avatar Feb 06 '24 17:02 morehouse

The data collection is dope! And I'd like to mention it's slightly inaccurate here as I think the probability should be $\frac{C_{1151}^4C_{7290}^{4}}{C_{8441}^8} = 0.0134$, since we are picking peers without replacement.

If all we care about is to get the tx relayed to the miners, the next important question is, what's the probability of the event that all our 4 low-feefilter peers will further be rejected by their peers? This means each of the peer must have 8 high-feefilter peers, which has $\frac{C_{7290}^{8}}{C_{8441}^8} = 0.3093$, this event has $0.3093^4 = 0.0092$.

So the probability that we would get our tx rejected by our peers' peers, given we have 4 low-feefilter peers, is $0.0092 * 0.0134 = 0.00012$, or 0.012%.

This is a simplified model as it only accounts for 1 case and a depth of 2 (I created this to help with the calculation) and can go deep due to the unknown topology. So I think median works, plus we are doing a max(feefilter, mempoolminfee) anyway, and we rebroadcast at every block, so new fresh feefilter and mempoolminfee are collected there, which should be enough for the sweeping txns.

We have to make sure that we're secure against malicious behavior otherwise we might as well not have this logic at all.

I'm referring to the polling of history data and I don't think it helps. Also it's hard to define what's malicious and what's informative since we only have tiny view of the network - an attacker may cause the feefilters to go up, causing us to pay more fees, or worse, go down, making us to believe the general network feerate is low, then the tx will never be relayed, hence htlc timeouts. But in lnd we don't know if it's a reflection of the network or someone is messing with us. Luckily we still have mempoolminfee, tho I'd argue if the threat model is the attacker can control our outbound peers then they can also control our mempool. In the end it's 8 outbound peers we cannot control in lnd, and this should be fixed in bitcoind, which already has an issue.

On the other hand, I think it'd be cool to turn the crawler into a subservice, maybe in btcwallet, to give us a larger view of the network. Future stuff.

Another consideration is that we may want to set the commitment fee rate at some % above the min relay fee, to provide some protection if relay fees rise soon after we craft the commitment transaction.

Yeah I like it. To be clear we are using mempoolminfee not minrelaytxfee, but I think we mean the same thing here.

yyforyongyu avatar Feb 07 '24 10:02 yyforyongyu

This is a simplified model as it only accounts for 1 case and a depth of 2 (I created this to help with the calculation) and can go deep due to the unknown topology.

This depth is a good point, there could be a path from the low-feefilter peers to a miner, but I don't think we can quantify it because we don't know the topology of the bitcoin p2p network (iirc one paper said it formed a regular graph but I forget the size). My point with the low-feefilter data was that if they are at least half of our connections, we could choose feerates that are meaningless wrt the rest of the network and hence not likely to be included in a block. Another thing to note is that miners also run mempools and have their own feefilters.

I'm referring to the polling of history data and I don't think it helps

It helps because if we use on-demand querying and have one data point, we shouldn't use just that one data point. Instead we can use the lookback window (which would store polling data for the last 3 blocks).

Also it's hard to define what's malicious and what's informative since we only have tiny view of the network - an attacker may cause the feefilters to go up, causing us to pay more fees, or worse, go down, making us to believe the general network feerate is low, then the tx will never be relayed, hence htlc timeouts.

I believe feefilters being artificially high is worse than artificially low because this is being used as a floor. If they're artificially low, it's bounded by our min relay fee and we still have regular old fee estimation to rely on. If they're artifically high, we don't have any protection against that, but we could if we decide on some sort of maximum value.

tho I'd argue if the threat model is the attacker can control our outbound peers then they can also control our mempool.

The threat model is where the attacker controls some of our outbound peers. If they can control all of our outbound peers, they can do worse than feed us bad feefilter info. It's unlikely that they'll control all of our outbound peers and therefore unlikely that they can control our mempool.

Crypt-iQ avatar Feb 07 '24 15:02 Crypt-iQ

It seems like the last point of contention here is on-demand querying vs background polling. My reservations with on-demand querying are that:

  • we may not have enough data points at the time of the query
  • prone to fluctuations

We could partially tackle the first point by ignoring the results of the query if we have < 6 data points. This would protect against the aforementioned case where we only have like 1 outbound peer. I think the fluctuations bit can be a negative in some cases and a positive in others.

Ultimately, I think either way is going to be better than what we have as long as there's some sensible ceiling. My personal vote is for background polling, but happy to hear other's thoughts. @ProofOfKeags @morehouse @yyforyongyu

Crypt-iQ avatar Feb 08 '24 16:02 Crypt-iQ

...on-demand querying vs background polling

...prone to fluctuations

Could do both, where you always try for a fresh result but we keep a very simple backup in case there's no valid live data to pull. If we wanna keep stats on how often a live query fails we could too. I don't think we necessarily need any history smoothing function. The Bitcoin network is not inherently smooth. So the fluctuations are part of the reality we have to contend with. People drop blocks of transactions to the mempool all the time and that can make fees jump dramatically very quickly and that's something we probably need to respond to.

I think I'm more or less taking the position that jitter is good and I don't think we should try to eliminate it.

ProofOfKeags avatar Feb 08 '24 19:02 ProofOfKeags

My thoughts:

  • Some sort of history to fall back on seems helpful, in case the data is temporarily sparse when doing on-demand querying.
  • Background polling seems the best way to maintain a recent history.
  • If we already need a recent history it's probably simplest to just use it, rather than also doing on-demand queries.
    • Currently we update commitment fees every 10-60 minutes, and the proposed polling interval is 5 minutes, so there isn't much time for the history to diverge from on-demand rates.
    • Arguably the recent history could be a better estimator of min relay fee over the next 10-60 minutes than a single data point queried on demand, since we're smoothing out temporary fee-rate spikes higher or lower.
    • For nodes with lots of channels, there can be a substantial reduction in RPCs to bitcoind by using background polling instead of on-demand (i.e. the recent history acts as a cache).
  • Setting commitment fees at some % higher than the smoothed relay rate helps compensate for underestimated relay fees.

morehouse avatar Feb 13 '24 17:02 morehouse

@roasbeef: review reminder @morehouse: review reminder @crypt-iq, remember to re-request review from reviewers when ready

lightninglabs-deploy avatar Feb 27 '24 01:02 lightninglabs-deploy

I've addressed the main contention of the moving median by removing that logic and just tracking a median without any history. I'll address the other comments later. There are two TODOs left in the code:

  • cap the value that we'll use from our peers' feefilter when deciding a minimum fee
  • potentially ignore feefilter values outside of a certain band

In the meantime, I'll run the code with both bitcoind & btcd backends.

Crypt-iQ avatar Mar 01 '24 22:03 Crypt-iQ

Latest change is that the median feefilter is now capped when used in RelayFeePerKW by the 1-conf block target feerate. This prevents an adversary with the majority of outbound connections from forcing us to use a high median feefilter value.

There are three scenarios to consider here:

  • block feerates are steadily high (malicious):
    • the attacker can make us use the going 1-conf block feerate as our minimum fee. This is not ideal, but is fine.
  • block feerates trending upwards (non-malicious):
    • bitcoind's estimatesmartfee lags (not sure about btcd) so our minimum fee result will be capped at a lower block inclusion rate than the current one (and may be less than our peer's feefilter values). This isn't worse than what we have now.
  • block feerates trending downwards (malicious):
    • since estimatesmartfee lags, an attacker can make our minimum fee be as high as the 1-conf block feerate even though feerates are trending down. This is also not ideal, but the worst-case is identical to case 1 and should be fine.

Crypt-iQ avatar Mar 12 '24 17:03 Crypt-iQ

Added tests

Crypt-iQ avatar Mar 19 '24 17:03 Crypt-iQ