[ad_1]
Since the introduction of headers-first synchronization in Bitcoin Core 0.10, the blockchain data structure can be seen as having three components:
- The block header tree: a tree data structure, with the genesis block as root, containing all headers for known forks of the chain. This tree structure is always connected, i.e., it is not possible to add a header to the tree unless its parent is already in the tree as well.
- The block data: for each entry in the block header tree, we may or may not have the corresponding block data. Blocks can be received (and asked for) out of order, as long as the corresponding header exists in the header tree. There can be gaps, so it is not the case that a block can only be stored if the parent block is also stored. Of course, a block can only become the active chain tip if it’s fully validated, which requires having seen all blocks before it too. Since the introduction of block file pruning in Bitcoin Core 0.11, it is also possible to delete blocks again after being validated; in this case, the corresponding headers are still kept in the tree.
- The active chaintip (a reference to an entry in our block header tree) with its corresponding UTXO set (or chainstate), representing the fully-validated block that we currently consider active (this was introduced in Bitcoin 0.8).
Now to explain how synchronization happens, there are three separate processes that govern these data structures respectively:
- Header synchronization: the process of requesting, receiving, and storing, headers that a peer has but we do not.
- Block synchronization: the process of requesting, receiving, and storing, the full block data for headers we already have, which peers have but we do not.
- Block activation: the full validation of blocks we have, and resulting changing of the active chain tip.
They all happen in parallel in practice, but it’s easier to think of them as a sequence.
1. Headers synchronization
Requesting headers. The first step of synchronization is learning about headers which our peers have. This is primarily done by sending getheaders
to each peer when we connect or accept a connection (by sending a locator which indicates which headers we already have), to which they’ll respond with a headers
message containing up to 2000 new headers. If the full 2000 headers are received, we’ll send another getheaders
to ask for more. This process continues until we have all headers the peer has to offer. When we’re far behind, we only ask one peer (the headers sync peer) to minimize duplicating bandwidth usage, but once we get close to the current timestamp, all peers are asked.
Non-connecting headers. If headers arrive that don’t connect to our existing headers tree (i.e., the parent of a received block isn’t in our tree yet), we also send a getheaders
in response to first learn about all the headers in between. Under no circumstances are headers processed when we don’t know their parents yet. This doesn’t mean they are rejected (as in: they are not marked permanently invalid); they are just ignored (as if they were never received at all). If all goes well, we’ll receive them again after their missing parent headers have arrived, at which point they’ll be processed normally.
Direct headers announcement. Since the introduction of the direct headers announcement mechanism (negotiated using the BIP130 sendheaders
message) added in Bitcoin Core 0.12.0, peers can send us headers
messages directly to announce a new active chain tip (see below). In case this mechanism is not used (either because the peer believes we are far behind, or because the peer doesn’t support BIP130), they can also send us an inv
message with just a block hash. This too will trigger us to send a getheaders
in response to learn about any headers we miss, up to and including the hash that was just inv
ed to us.
Header validation. When headers arrive, they are validated to the extent possible. This includes syntactic correctness, proof-of-work, difficulty adjustments, rules about timestamps (larger than the median of the past 11 blocks, not more than 2 hours in the future), and rules about version numbers (BIP34, BIP66, and BIP65 put requirements on the version number). If a header fails any of these, it is ignored, as well as any headers that descend from it.
Header spam protection. While block headers are tiny (81 bytes in a header
message), we do need to protect against storing tons of low-difficulty ones that an attacker might create. Proof-of-work inherently makes it very expensive to produce good headers in mass, but an attacker can construct a headers chain that forks off somewhere early in history (e.g. just after the genesis block) when the difficulty was low, without ever amounting to a significant amount of work. This could ultimately lead to memory issues for nodes if they would accept these headers into their block header tree. Historically, checkpoints were (among other things) used to guard against this, but since Bitcoin Core 24.0 another, more comprehensive, mechanism is used: header pre-synchronization. In short, it involves downloading the headers twice: a first time to verify the headers form a chain with a significant amount of work (but not adding them to the block header tree yet), and a second time where they’re redownloaded and if they match what was sent in the first phase, they are added to the block header tree. You can read more about this mechanism in this answer.
2. Block synchronization
Acceptable blocks. Once we know about the headers, we can download blocks. In general, blocks are only requested (and accepted) if they are part of a headers chain whose tip has at least as much cumulative work as the current active chain tip. Blocks that are not on our headers tree are never accepted (these would be orphan blocks), nor are blocks on forks whose header chain doesn’t have a competitive amount of cumulative work.
Request mechanism. Blocks are requested using getdata
messages, to which the peer responds with block
messages (getdata
s are also used for transaction fetching, but I am ignoring that here). To decide which peers these requests are sent to, we keep track of the inv
and headers
announcements peers have sent us, so we have an idea of what the last block header we have in common with them is. We generally assume that peers who announce a block to us can also actually provide the corresponding block data (at least when they set the NODE_NETWORK
service flag, or NODE_NETWORK_LIMITED
for sufficiently recent blocks).
Scheduling of requests. The actual getdata
requests are sent for missing blocks anywhere along the path from the genesis block to the headers tree tip with the highest accumulated work (which can be distinct from the active chain tip) among all block header tree entries which are not marked as permanently invalid. The requests are spread over all peers for which we know they have the respective block, in a round-robin fashion, with a limit of 16 requested-but-not-yet-received blocks per peer. We do not request blocks which are more than 1024 blocks ahead of the current active chain tip, to limit how out-of-order blocks can be received (this matters for block file pruning). As long as a block request is outstanding, the same block is not requested again from another peer to avoid duplicating bandwidth usage. If a peer stalls too long in responding to a block request, it is disconnected, and the request will be sent elsewhere.
Unrequested blocks. It is possible that a block is received without it being requested. Bitcoin Core will not send unrequested blocks, but some other P2P clients might. If this happens, we will process the header inside the block as if it was received through a headers
message, and then roughly if we would want to request it after seeing the header, we also process the block directly. Otherwise, it is ignored (i.e., we act as if only the header was received).
Compact block announcements. In Bitcoin Core 0.13.0, support for BIP152 compact blocks was added. In compact blocks, blocks are sent without their full transaction data, but with short (48-bit) salted hashes of the individual transactions instead. When receiving a compact block, the receiver tries to reconstruct the full block using transactions they have in memory, and then requests any missing ones. The details are out of scope here, but compact blocks includes a feature called high-bandwidth mode. When enabled, we permit a limited number of peers to announce blocks to us directly using a cmpctblock
message, skipping a roundtrip using headers
/inv
and getdata
which saves latency. In this case, we again process the header embedded in the compact block first, and then process the compact block if the block looks like it is very close to our active chain tip (max 2 blocks ahead).
Block validation. When a block is processed, all applicable consensus rules are performed, which includes syntactic validity, recomputing the transaction Merkle root, transaction finality, BIP34 (“height in coinbase”), segwit commitments, and the maximum block weight. If these checks fail, the block is ignored, or in some cases marked as permanently invalid, depending on the type of error. If the checks succeed, the block data is stored on disk. Scripts, double-spending, and inflation cannot be verified yet.
getblocks
based synchronization. Before the introduction of headers-first synchronization a different mechanism was used. Instead of getheaders
to initiate synchronization, the similar getblocks
message was sent, to which the peer would respond with an inv
containing up to 500 block hashes, which would then trigger getdata
s and further getblocks
to continue. Bitcoin Core no longer uses this mechanism since version 0.10, but it is still supported for peers who want to synchronize this way (i.e., Bitcoin Core will respond to getblocks
messages, but will not send them).
3. Block activation
Lastly, when we have the blocks, we can decide what our active chain tip should be, and fully validate the blocks involved.
Active chain tip selection. The rule is that at every point in time, we aim to make the active chain tip be:
- Among all blocks in the block header tree for which we have the full data and full data for all its ancestors, and which are not (yet) marked as permanently invalid…
- considering only the ones with the highest cumulative work…
- and among those, pick the one for which the full data was received first.
Moving towards the selection. If, at any point, the result of this selection is not the same as the current active chain tip, we start making progress towards the selection. First the blocks that are part of the active chain but not in the selected chain are disconnected in reverse order, and then the blocks in the selection not in the active chain are connected in forward order. If disconnections are involved in the activation process, it is called a reorganization.
Full block validation. Connecting involves performing full validation, of all consensus rules, including script validity, subsidy checks, and checking for double-spends, which need access to the UTXO set. We only maintain the UTXO set at the active chain tip, so connecting involves removing from the UTXO set any inputs in the block, and adding any outputs. Disconnecting does the reverse, removing any outputs the disconnected block added, and re-adding any inputs it spent. If validation fails here, the block is marked as permanently invalid, which will likely cause the selection logic above to change its idea about what the active chain tip should be.
Announcing new chain tips. Whenever the activation process stabilizes, peers are sent an announcement of our new active chain tip. If only a few blocks were connected, and the peer supports BIP130, a headers
message is sent with the headers of all connected blocks. Otherwise we revert to sending an inv
of just the new tip. In case the peer selected us as a high-bandwidth compact blocks peer, the announcement is sent before full validation completes in circumstances permitted by BIP152 (again as a means to reduce latency).
[ad_2]
Source link