neutrino
neutrino copied to clipboard
Add sideloading functionality to neutrino
Related issue https://github.com/lightninglabs/neutrino/issues/70
This issue adds the sideloading functionality in neutrino. We want to be able to preload the block header and filter header store with headers before neutrino starts up. This optimizes speed.
Change summary:
- Introduce a sideload package responsible for sideloading. It fetches headers from the source, validates them, and writes to the store.
- Write an implementation for fetching block headers from a binary encoded source.
- Decouple block header validation and storage from the block manager.
- Decouple the process of finding the next and previous block header checkpoints from the block manager.
- Include functionality in the neutrino chain service.
Thanks @positiveblue , please I left comments on your review and pushed some changes in response to your feedback (the ones that I am clear on).
Hello @positiveblue, have you had the chance to look at this again. If there are no objections to my comment. I will just make fmt
on this and include a description of the encoding and push.
Hello can anyone please run the workflow?
Please can anyone help run the workflow?
hmm, this works fine locally. I think I would just write a simple unit test instead of the integration test.
@Chinwendu20 have you run it with the race condition detector locally?
Oh no I did not @Roasbeef but I have made some changes now and ran it with the race detector, hopefully, this works on the CI as well. Can anyone please help run the workflow?
Oh okay I will just mock the reader interface and write a unit test now.
Please can anyone help run the workflow
Hi @Chinwendu20, I'd love to implement this into a wallet I'm working on for Joltz Rewards (https://joltzrewards.com).
Let me know if you'd like any support in addressing this PR review. Happy to dedicate time towards it.
Nice work so far on this feature!
Completed an initial review, with the following jumping out:
- Style guide not closely adhered to (godoc string, 80 char columns, etc)
- The main set of interfaces can be simplified.
- The control flow can be simplified: if we have more headers than in the file, we can terminate. We should check the file before starting up any other sub-system. The file can just have the header be removed, then appended in one step to the end of
headers.bin
.- We can also use the side loader for the filter headers as well. I think this is where that generic param can actually be used.
- No need for custom re-org logic, just load the file if it's relevant, then let the normal p2p logic handle reorgs if needed.
Thanks for the review left some comments and would implement changes that I am clear on.
Hi @Chinwendu20, I'd love to implement this into a wallet I'm working on for Joltz Rewards (https://joltzrewards.com).
Let me know if you'd like any support in addressing this PR review. Happy to dedicate time towards it.
Cool. I guess when the PR is merged you should be able to use it. Feel free to also make suggestions to the PR. Thanks for your help.
Hello all what are your thoughts on the NextHeader
function.
https://github.com/lightninglabs/neutrino/pull/285/files/e13c95fa3e76fd1883a8b2d0db0ba96a53b9b4c2..76952b79ac2a0c77450bc4dbd74275c325f5b1ef#diff-2345cea8b12f07d1f60d42b7e5563720e69e1544c641261cf147668c4599320bR3131
I say the function accepts an argument, n that would enable it fetch n headers at a time. On the blockmanager side if verification is disabled for sideloading, we fetch all the headers from the file at once using one NextHeaders(endHeight - startHeight)
call and then write to the DB at once as well.
If verification is enabled we fetch a block header at a time i.e. NextHeader(1)
as we verify headers individually.
What are your thoughts?
Hello all what are your thoughts on the
NextHeader
function. https://github.com/lightninglabs/neutrino/pull/285/files/e13c95fa3e76fd1883a8b2d0db0ba96a53b9b4c2..76952b79ac2a0c77450bc4dbd74275c325f5b1ef#diff-2345cea8b12f07d1f60d42b7e5563720e69e1544c641261cf147668c4599320bR3131I say the function accepts an argument, n that would enable it fetch n headers at a time. On the blockmanager side if verification is disabled for sideloading, we fetch all the headers from the file at once using one
NextHeaders(endHeight - startHeight)
call and then write to the DB at once as well.If verification is enabled we fetch a block header at a time i.e.
NextHeader(1)
as we verify headers individually.What are your thoughts?
Making it able to return a batch of headers seems like a good idea. It can be limited to something like 2k
Hello all what are your thoughts on the
NextHeader
function. https://github.com/lightninglabs/neutrino/pull/285/files/e13c95fa3e76fd1883a8b2d0db0ba96a53b9b4c2..76952b79ac2a0c77450bc4dbd74275c325f5b1ef#diff-2345cea8b12f07d1f60d42b7e5563720e69e1544c641261cf147668c4599320bR3131 I say the function accepts an argument, n that would enable it fetch n headers at a time. On the blockmanager side if verification is disabled for sideloading, we fetch all the headers from the file at once using oneNextHeaders(endHeight - startHeight)
call and then write to the DB at once as well. If verification is enabled we fetch a block header at a time i.e.NextHeader(1)
as we verify headers individually. What are your thoughts?Making it able to return a batch of headers seems like a good idea. It can be limited to something like 2k
I am averse to doing this because if we are verifying the header that would mean reading bytes from the file and deserializing to wire.BlockHeader
for headers that could potentially be invalid, just seems like a waste of time and processing power.
However, if we read a header at a time from the file, we can skip reading bytes and deserializing on the first instance of an invalid header.
I am averse to doing this because if we are verifying the header that would mean reading bytes from the file and deserializing to wire.BlockHeader for headers that could potentially be invalid, just seems like a waste of time and processing power.
If one of the headers is invalid, then we'd through away the entire file (the should all connect and be valid). Arguably reading only 80 bytes at a time is a "waste of processing" power given it doesn't effectively utilize the I/O bandwidth available, and most filesystems will read in extra items of the file into a kernel level buffer cache to further reduce I/O (speculative caching). We can also just memory map the entire file, which is pretty much the most efficient way to read bytes from disk.
Ultimately, we'll throw an interface in front of the actual consumption (certain envs won't be able to directly read the file system as mentioned above), so we can optimize it further later.
I am averse to doing this because if we are verifying the header that would mean reading bytes from the file and deserializing to wire.BlockHeader for headers that could potentially be invalid, just seems like a waste of time and processing power.
If one of the headers is invalid, then we'd through away the entire file (the should all connect and be valid). Arguably reading only 80 bytes at a time is a "waste of processing" power given it doesn't effectively utilize the I/O bandwidth available, and most filesystems will read in extra items of the file into a kernel level buffer cache to further reduce I/O (speculative caching). We can also just memory map the entire file, which is pretty much the most efficient way to read bytes from disk.
Ultimately, we'll throw an interface in front of the actual consumption (certain envs won't be able to directly read the file system as mentioned above), so we can optimize it further later.
Thanks for this, then we can go with loading all the headers in the file at a time, i.e. NextHeaders(sideloadHeaders.EndHeight - sideloadHeaders.StartHeight)
This PR has been updated.
Nice work w/ the latest revision! This is starting to get pretty close. IMO the final change we can make here is to decouple header side loading from the block manager all together. Instead we'll load the headers directly into the DB, use a refactored version of
checkHeaderSanity
to validate it, then start the block manager.Also we don't need to look at these headers at all if we have more headers than the side load file.
Thanks for the review. Could you please throw more light on the last sentence? Can you confirm that this bit does not cover the concern: https://github.com/lightninglabs/neutrino/pull/285/files#diff-2345cea8b12f07d1f60d42b7e5563720e69e1544c641261cf147668c4599320bR3133-R3140
Also please address this: https://github.com/lightninglabs/neutrino/pull/285#discussion_r1414517226
Also are there comments on how the headers are being read from the io.ReadSeeker
?
Hello all cc: @Roasbeef , @Crypt-iQ
On decoupling the sideloading from the blockmanager. This is the function that would be added to neutrino.go
(rough sketch we might end up returning an error, the main idea is to look at the process of verification, handling bad headers and writing to block header)
func (s *ChainService) sideLoadHeaders(sideload chaindataloader.
BlockHeaderReader) {
// If headers contained in the side load source are for a different chain
// network return immediately.
if sideload.HeadersChain() != s.chainParams.Net {
log.Error("headers from side load file are of network %v "+
"and so incompatible with neutrino's current bitcoin network "+
"-- skipping side loading", sideload.HeadersChain())
return
}
_, height, err := s.BlockHeaders.ChainTip()
if err != nil {
log.Warnf("Error fetching chain tip while sideloading %v", err)
return
}
log.Infof("Side loading headers from %v to %v", height,
sideload.EndHeight())
// Set headerTip to enable reader supply header, node needs
err = sideload.SetHeight(height)
if err != nil {
log.Errorf("error while setting height for sideload--- skipping "+
"sideloading: %v", err)
return
}
headerList := headerlist.NewBoundedMemoryChain(
numMaxMemHeaders,
)
for {
// Request header
headers, err := sideload.NextBlockHeaders(2000)
if len(headers) == 0 {
return
}
// If any error occurs while fetching headers, return immediately.
if err != nil {
log.Errorf("error while fetching headers -- skipping "+
"sideloading %v", err)
return
}
for _, header := range headers {
var (
node *headerlist.Node
prevNode *headerlist.Node
)
// Ensure there is a previous header to compare against.
prevNodeEl := headerList.Back()
if prevNodeEl == nil {
log.Warnf("side load - Header list does not contain a " +
"previous element as expected -- exiting side load")
return
}
node = &headerlist.Node{Header: *header}
prevNode = prevNodeEl
node.Height = prevNode.Height + 1
if !skipVerify {
valid, err := verifyBlockHeader(header, *prevNode,
headerList, s.checkHeaderSanity)
if err != nil || !valid {
log.Debugf("Side Load- Did not pass verification at " +
"height %v-- rolling back to last verified checkpoint " +
"and skipping sideload", node.Height)
prevCheckpoint := b.findPreviousHeaderCheckpoint(
node.Height,
)
log.Infof("Rolling back to previous validated "+
"checkpoint at height %d/hash %s",
prevCheckpoint.Height,
prevCheckpoint.Hash)
err = b.rollBackToHeight(uint32(prevCheckpoint.Height))
if err != nil {
panic(fmt.Sprintf("Rollback failed: %s", err))
// Should we panic here?
}
return
}
// Verify checkpoint only if verification is enabled.
if b.nextCheckpoint != nil &&
node.Height == b.nextCheckpoint.Height {
nodeHash := node.Header.BlockHash()
if nodeHash.IsEqual(b.nextCheckpoint.Hash) {
// Update nextCheckpoint to give more accurate info
// about tip of DB.
b.nextCheckpoint = b.findNextHeaderCheckpoint(node.Height)
log.Infof("Verified downloaded block "+
"header against checkpoint at height "+
"%d/hash %s", node.Height, nodeHash)
} else {
log.Warnf("Error at checkpoint while side loading " +
"headers, exiting at height %d, hash %s",
node.Height, nodeHash)
return
}
}
}
// convert header to headerfs.Blockheader and add to an array.
headersArray = append(headersArray, header)
b.headerList.PushBack(*node)
}
s.BlockHeaders.WriteHeaders(headersArray)
}
}
Since we are decoupling, I plan on making some of the functions that were previously just blockmanager
methods to neutrino's ChainService
method then include them if needed to the blockmanager, by adding them as function fields to the blockmanager's config
as function field. Some of these functions include:
-
.findPreviousHeaderCheckpoint
-
findNextHeaderCheckpoint
(We can make these just functions instead of just methods to a struct but these would mean we pass the chainParameters as arguments everytime.)
-
checkHeaderSanity
_(This would also mean removing these fields from the blockmanager:
- minRetargetTimespan
- maxRetargetTimespan
- blocksPerRetarget
and making them fields in the ChainService)_
Is this okay?
I feel that no matter how we spin it we would be duplicating something the handleheaders
function does if we decide to follow this route. Does it seem the same way, to you as well?
I feel that the concern about breaking something by refactoring the handleheaders, is the reason why we have tests? But if this does not seem like duplicity and/or there are better ideas of implementing this, perhaps decoupling the entire handleheaders
function from the blockmanager (if that is a good idea) I am open to hearing it.
@Chinwendu20
Also we don't need to look at these headers at all if we have more headers than the side load file.
I mean that if we determine that a flat file has 100 headers (consider that with a basic preamble in the file that lists the total number of headers, we can verify that based on the size of the file), but we have 1000 headers on disk, then we don't need to even look at the sideloaded headers.
Can you make a permalink of that code location? Otherwise, the link doesn't seem to work for me (won't go to the right location for the diff).
@Chinwendu20
On decoupling the sideloading from the blockmanager.
Thanks for this draft! Some initial thoughts:
- Why does it set the height of the side loader based on our local height? What we want to check is that we even need to consule the file in the first place. See my comment above about comparing the heights, and bailing if we have more headers thatn the filter.
- We can also generalize this to work for both block headers and also filter headers. See my comment way above in the review where I suggest a way to use type params to re-use code between the two types of headers. We can also further make a wrapper struct to abstract away: how the headers are verified, and how they're written to disk. This lets us maintain a generalized loop where we just care about validating the header and also writing them to disk.
- I don't think this needs to be a method on the chain service, instead it can be a new struct to accept the dependancies during init. This'll also make this logic easier to test.
- We may want to make the chunk size (number of headers read at a time) configurable.
Generally I dig this direction, and it shoudl let us cut down on quite a lot of code.
I think once this latest diff is up, we also want to make a PR to lnd
that adds in the new config, so we can start to get a feel of how users would use the feature in the first place. I think we'll want a new config option, and also likely a new RPC on the existing neutrino sub-server to allow users to pass in the header blob directly.
@Chinwendu20
On decoupling the sideloading from the blockmanager.
Thanks for this draft! Some initial thoughts:
- Why does it set the height of the side loader based on our local height? What we want to check is that we even need to consule the file in the first place. See my comment above about comparing the heights, and bailing if we have more headers thatn the filter.
Please confirm that the setHeight
function does not do that already: https://github.com/lightninglabs/neutrino/blob/e8fca396d84427bb3882a879f08200f3dedc07b6/chaindataloader/binary.go#L206-L209
- We can also generalize this to work for both block headers and also filter headers. See my comment way above in the review where I suggest a way to use type params to re-use code between the two types of headers. We can also further make a wrapper struct to abstract away: how the headers are verified, and how they're written to disk. This lets us maintain a generalized loop where we just care about validating the header and also writing them to disk.
- I don't think this needs to be a method on the chain service, instead it can be a new struct to accept the dependancies during init. This'll also make this logic easier to test.
The wrapper struct mentioned in point 2 and the struct accepting the dependencies during init you referred to in point 3, I am guessing they all refer to one struct. Also creating this struct would mean that:
- We both agree that this struct for verification and for accepting dependencies would be outside the
chaindataloader
package - We would create two distinct
wrapper struct
for the filter header and then the block header? as I know the process of block header and filter header verification are quite different Or we use one struct and have it accept all dependencies for both the filter header and block header verification, call itheaderVerfier
or something along those lines? - In line with not repeating logic as the concern I raised in this comment: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 we might need to at least for block headers, refactor the
handleheaders
function to use this headerVerifier struct for verification?
Is this in line?
- We may want to make the chunk size (number of headers read at a time) configurable.
Generally I dig this direction, and it shoudl let us cut down on quite a lot of code.
I think once this latest diff is up, we also want to make a PR to
lnd
that adds in the new config, so we can start to get a feel of how users would use the feature in the first place. I think we'll want a new config option, and also likely a new RPC on the existing neutrino sub-server to allow users to pass in the header blob directly.
Please confirm that the setHeight function does not do that already:
Sure that works. I meant more so that we just don't need to call that method at all if our local height is beyond the height in the file. Otherwise as is, the caller needs to handle that error. Notice how if unchanged, each time a user starts up, they see a confusing error.
We both agree that this struct for verification and for accepting dependencies would be outside the chaindataloader package
No this is about making the code more generalized so it can support both filters headers and block headers. This logic can still live in the package. As a thought exercise: how would you go about updating the PR to support both block and filter headers? After you've thought about that, revisit my comment.
This is about abstracting header verification, and exactly how it's written to disk.
We would create two distinct wrapper struct for the filter header and then the block header? as I know the process of block header and filter header verification are quite different Or we use one struct and have it accept all dependencies for both the filter header and block header verification, call it headerVerfier or something along those lines?
Yes they are different, but given an abstract header, you can verify it. The verification differs, but we can create a interface that abstracts away from those details and allows the caller to execute a generalized ingestion loop.
In line with not repeating logic as the concern I raised in this comment: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 we might need to at least for block headers, refactor the handleheaders function to use this headerVerifier struct for verification?
We don't need to change any of the core logic in the daemon. We also don't need to have the method on the ChainService
struct as I've mentioned several times above. To aide you in the thought experiment above, examine that route: where would it need to change to also support filter headers? How would you abstract the code to have it work for both filter and block headers?
Thanks for clarifying
Sure that works. I meant more so that we just don't need to call that method at all if our local height is beyond the height in the file. Otherwise as is, the caller needs to handle that error. Notice how if unchanged, each time a user starts up, they see a confusing error.
So currently we have something like this on the caller end:
err = sideload.SetHeight(height)
if err != nil {
// What I understand is that you do not want this logged error to show up every time.
log.Errorf("error while setting height for sideload--- skipping "+
"sideloading: %v", err)
return
}
View comment in the code snippet for my understanding on the problem you would like addressed
Maybe to address this, I create an error in the chaindataloader
package, ErrNoHeader
so instead of what we have here on line 207:
https://github.com/lightninglabs/neutrino/blob/e8fca396d84427bb3882a879f08200f3dedc07b6/chaindataloader/binary.go#L206-L209
We return that error i.e. the ErrNoHeader
instead so from the caller end we modify it to:
err = sideload.SetHeight(height)
// We only log an error if another error comes up that is not the error that indicates that the sideload source do not have
// the required next headers.
if err != nil && err != chaindataloader.ErrNoHeader{
// What I understand is that you do not want this logged error to show up every time.
log.Errorf("error while setting height for sideload--- skipping "+
"sideloading: %v", err)
return
}
Would this work?
We both agree that this struct for verification and for accepting dependencies would be outside the chaindataloader package
Thanks for this. The current design for the chaindataloader
is to just supply headers. So anything that has to do with how those supplied headers are handled depends on the chaindataloader's caller. That is why this comment.
No this is about making the code more generalized so it can support both filters headers and block headers. This logic can still live in the package. As a thought exercise: how would you go about updating the PR to support both block and filter headers? After you've thought about that, revisit my comment.
From this I take it that you are suggesting that the functionalities of the chaindataloader be extended to verify headers as well?
This is about abstracting header verification, and exactly how it's written to disk.
We would create two distinct wrapper struct for the filter header and then the block header? as I know the process of block header and filter header verification are quite different Or we use one struct and have it accept all dependencies for both the filter header and block header verification, call it headerVerfier or something along those lines?
Yes they are different, but given an abstract header, you can verify it. The verification differs, but we can create a interface that abstracts away from those details and allows the caller to execute a generalized ingestion loop.
In line with not repeating logic as the concern I raised in this comment: #285 (comment) we might need to at least for block headers, refactor the handleheaders function to use this headerVerifier struct for verification?
We don't need to change any of the core logic in the daemon. We also don't need to have the method on the
ChainService
struct as I've mentioned several times above. To aide you in the thought experiment above, examine that route: where would it need to change to also support filter headers? How would you abstract the code to have it work for both filter and block headers?
I read back at your previous comment to better understand your suggestion.
- We can also generalize this to work for both block headers and also filter headers. See my comment way above in the review where I suggest a way to use type params to re-use code between the two types of headers. We can also further make a wrapper struct to abstract away: how the headers are verified, and how they're written to disk. This lets us maintain a generalized loop where we just care about validating the header and also writing them to disk.
I think what I understand from this point is that you are suggesting a single loop for verifying headers and writing into the disk as opposed to my current design of for example, taking a look at my draft here: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 this loop only works for block headers. Is that correct?
Maybe to address this, I create an error in the chaindataloader package, ErrNoHeader so instead of what we have here on line 207:
What I mean is that we don't even really need that code path at all: if our height is beyond the hight of the header file, we simply don't need to attempt to load it in or fully initialize it.
From this I take it that you are suggesting that the functionalities of the chaindataloader be extended to verify headers as well?
Yes.
I think what I understand from this point is that you are suggesting a single loop for verifying headers and writing into the disk as opposed to my current design of for example, taking a look at my draft here: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 this loop only works for block headers. Is that correct?
Yes. Continue to further develop that (generalize to abstract out to support both block+filter headers), then update the PR with the new approach.
Thanks a lot for the review so far @Roasbeef, if we move header verification for the sideload to the chanindataloader
package that means, we import the dependencies listed here: https://github.com/lightninglabs/neutrino/pull/285#issuecomment-1859257741 to the chaindataloader
package, causing a circular dependence or if we do not import we duplicate those by rewriting it. We can have a loop that runs for all headers for both the verification and writing headers to the DB, have two functions that abstract these functionalities in the loop but their implementation might have to be in the neutrino package. What do you think?
You don't need to import the dependancies. An interface can be used instead: you declare how the object is to be used, and the caller figures out what to pass in. We don't need to duplicate anything, nor introduce a circular dep.
Please I have updated this PR, I have decoupled the sideloading from blockmanager, moved verification to the chaindataloader and created a generalized ingestion loop, this is not the final form of the PR yet, I have not written unit test and ran gofmt
to fix formatting nits. I would like to get review on this to know if I can move forward.