Code Review: The Consensus Critical Parts of Segwit
With segwit merged I thought it’d be good to do a final code review of the consensus-critical parts of the segwit codebase, and in the process of doing so, also explain in detail about how it all works. By “consensus-critical” we mean all the code that determines whether or not a block is valid, which is defined by BIP141. Additionally in practice P2P networking code is semi-consensus-critical, because a sufficiently broken network can prevent consensus by preventing consensus-relevant data from being propagated; networking is covered by BIP144.
I’m assuming you’re already familiar to segwit at a high level; if not I’d start by reading the FAQ.
Contents
Transaction Serialization And Hashing
Witness Transaction Identifier
Conceptually speaking the main thing segwit does is it adds an additional type
of transaction id, the witness txid (wtxid
). We’ll discuss how it’s used
later; it’s defined as follows:
[nVersion][marker][flag][txins][txouts][witness][nLockTime]
The marker
is the byte 0x00
, and the flag
is (currently1)0x01
. Recall
that a txid is hashed as follows:
[nVersion][txins][txouts][nLockTime]
…with txins
serialized as a compact size
encoded integer of the number of transaction inputs, followed by zero or more
transaction inputs. This means that if a sequence of bytes is a valid for the
purpose of calculating a wtxid
, if you attempted to reuse those same bytes to
calculate a txid
it’d be guaranteed to commit to a transaction with no
inputs.
tl;dr: It’s not possible to confuse a wtxid
for a txid
, except maybe in
the degenerate case of a tx with zero inputs, which is never valid anyway
(coinbase transactions have exactly one null input).
In general different types of things having the same identifier is one of those
things that often leads to problems, so it’s good to see this clearly
prevented2.
The witness
is a bit interesting. Each input is associated with witness data,
which is in the form of an array of byte arrays, the “stack”. The byte arrays
are variable size, so what’s actually hashed for witness
is:
{ <# of witness stack items>{<# of bytes><bytes> , ...}, ... }
However the actual size of the witness is not hashed because it’s always equal to the number of transaction inputs; non-witness inputs are required to set the number of witness stack items to zero.
Implementation
Bitcoin Core has always reused serialization code for hashing3, and segwit
continues that practice. We have three new classes for witness data,
CScriptWitness
,
CTxinWitness
,
and
CTxWitness
.
Transaction serialization code is then refactored into
SerializeTransaction()
:
Note how the nVersion
argument is not the same thing as the transaction version.
Due to how the serialization code works elsewhere, serialization and deserialization are combined, starting with the the latter:
The serialization version flag SERIALIZE_TRANSACTION_NO_WITNESS
means
“serialize/deserialize a transaction without witness data”. So we’ll be in the
above branch only during witness tx deserialization.
The above comment is actually wrong: we can also reach that branch if we
attempt to deserialize a transaction with an empty vin
in witness mode.
What’s called flags
above was called flag
in the (current version of)
BIP141 that we discussed in the last section. Since an unknown flag errors out,
we have a way to extend the serialization format in the future with
not-currently-committed data that’s guaranteed to be rejected by older nodes -
that’s actually a good thing as non-committed data can’t be verified by old
nodes, so allowing anything at all can be used as a DoS attack (as happened
before in CVE-2013-4627).
Next we have serialization/hashing:
Note how the fact that an actual dummy transaction input is written - rather
than the bytes 0x00
- could become relevant if something else changed how
transaction inputs are serialized.
Resizing the witnesses to the size of the inputs seems a bit dubious to me;
probably better if witness-using transactions were required to always have one
witness per txin to be considered valid. It also means that
SerializeTransaction()
- and in turn the serialization functions that call it -
can’t take const
transactions.
Commitment Structures
The pull-req’s second commit deals with the data structures needed to commit to witness transactions in blocks, as well as the version bits deployment.
Calculating the Witness Root Hash
In the same way that the merkle root hash commits to transactions, the witness root hash commits to the witnesses4 of every transaction (other than the coinbase).
The witness root hash doesn’t just commit to witnesses however: it commits to
wtxids
, which are entire transactions with witness data. This means that
transactions actually get inefficiently hashed twice: once for the merkle tree
of transactions, and again for the merkle tree of witness transactions.
While the inefficiency of hashing transactions twice is a minor thing - and can (very rarely) be an optimization for lite clients - but there’s a bigger issue: we now have to verify that both trees hash the same data. In Bitcoin Core this is done implicitly, as we expect to have the full set of transactions and calculate both commitments from them, but it’s easy to see how in other circumstances this opens up a whole new set of ways that miners can produce invalid blocks that implementations fail to detect properly, or handle in inconsistent ways5.
Secondly, because the coinbase can’t commit to itself, we’re forced to make the
coinbase an exception: its wtxid
is assumed to be 0x00...00
for the purpose
of calculating the witness root hash.
However if we look at the BlockWitnessMerkleRoot function that calculates the witness merkle root we find another subtle issue:
Pretty straightforward: calculate the wtxid
with
GetWitnessHash()
for every transaction, and then pass those leaves to the generic
ComputeMerkleRoot()
. But what’s that mutated
argument?
Unfortunately, there’s a pretty big flaw in Bitcoin’s merkle tree algorithm: multiple lists of transactions can result in the same merkle root. You can read the gory details here but what that means for us is since we’ve reused the same (flawed) algorithm for the witness root, we have to work around the flaw in the same way.
Right away that prevents us from using the existing merkle tree algorithm in any kind of pure-witness commitment, as the fix requires txids to be unique, and witness data isn’t; hashing the entire transaction guarantees uniqueness (via BIP34).
Committing to the Witness Root Hash
We’ve got a number of constraints in where we put the witness root hash commitment, ranging from the design of the Stratum mining protocol, to p2pool compatibility, to the fact that miners have been selling coinbase space for advertising. And of course, in a soft-fork putting the commitment the coinbase is basically our only option: we’d like it closer to the block header, but that may not even be a lite client soft-fork.
Segwit takes a fairly inclusive approach, with the commitment being a specially
formatted OP_RETURN
output, which
GetWitnessCommitmentIndex()
finds the index of:
That’s then used within ContextualCheckBlock()
in the following:
Ultimately the above is simply checking the version bits machinery if the segwit soft-fork is enabled.
The above is referring to how
CheckBlock()
ensures that the merkle tree hasn’t been malleated with the duplicate
txids vulnerability discussed earlier; CheckBlock()
guarantees that the
transactions list themselves hasn’t been malleated, thus we can calculate the
block witness merkle root from that without being concerned about malleation
(though when this code was merged CheckBlock()
had an incorrect
comment
stating otherwise). All the same, it’d probably be good to add an assertion
here that malleated
is false.
The above hashes the witness merkle root with the so-called “witness nonce” (now called “witness reserve value” in BIP141), which is taken from the coinbase’s witness - the only value the coinbase witness is allowed to have. The coinbase witness isn’t hashed by BlockWitnessMerkleRoot(), so if not for the above its value would be uncommitted (not good!).
Finally, having calculated what the witness commitment should be, we check that what’s in the block is exactly that:
Interestingly, even after segwit activates, miners are allowed to leave off the witness commitment if their block has no witness data. Since there’s no witness commitment controlling what the witness data is, we need to ensure that there’s no witness data at all to prevent attackers from appending junk witness data to such blocks:
So we DoS-ban the node that sent us that malleated block. But what else
happens? We’re called by AcceptBlock()
6 by the following:
Note the CorruptionPossible()
call - that’s used to indicate that while
something is invalid, it may be invalid because one of our peers corrupted the
data, not because the thing itself is intrinsically invalid. Without this
distinction a peer can trick us into marking blocks permanently invalid,
breaking consensus7.
Getting this right is subtle:
CValidationState::DoS()
has an optional (and currently undocumented) argument specifying whether there
is or is not potential corruption. We’ve used that argument correctly above,
setting corruption possible for all failures of uncommitted witness data, but
I’ll bet you didn’t notice the difference between the two types of DoS()
calls! It’d probably be a good idea to add a second, differently named,
constructor to make the distinction more clear…
Using The Witness Nonce^H^H^H^H^H Reserve Value For Protocol Upgrades
BIP141 describes the witness reserve value as being part of an “extensible commitment structure”, as its meaning can be changed later with a soft-fork to add “new commitment values” to blocks. Unfortunately, as I pointed out six months ago segwit is more dangerous and invasive than normal soft-forks from the point of view of the P2P layer, in that only segwit-aware peers can propagate witness data necessary for consensus. Since the reserve value is only a single, 32-byte value, we’re setting ourselves up for the same problem again8.
Verification
Validating Witness Program Scripts
Similar to how P2SH allows a scriptPubKey
to be replaced by a hash commitment
to the actual redeemScript
, segwit allows witness transaction outputs to
commit to the actual spend conditions with so-called “witness programs”. These
witness programs take of the form of scripts that match the following template:
<OP_0 to OP_16> <2 to 40 byte push>
The idea is that the first push acts as a version number, and the second push is the hash commitment. Normally a hash commitment would be <= 32 bytes, but the last minute it was pointed out that having just 16 more possible script versions might not be enough, so the commitment was extended to 40 bytes.
It’s worth noting that we’re not using the discourage upgradable NOPs mechanism meant to protect non-upgraded miners from accidentally mining transactions made invalid by a soft-fork; instead we’re relying on the cleanstack check. This is relevant in the witness-in-P2SH mode.
To validate that a script is a “witness program”, and to extract the version
and commitment, the CScript
class gains a new method,
IsWitnessProgram()
.
Let’s look at it:
Enforce minimum and maximum sizes. Confusingly, the enforcement is done at the level of the entire script; we’ll see the subtlety this leads to in a moment.
Enforce use of one of the standard push-number-to-stack opcodes. This means that in the event that we ever do need more than sixteen witness program versions, we’ll have to use a different version number scheme.
The last condition is kinda weird: why are we casting part of the script to a
size_t
? What’s actually happening here, is in a round-about way we’re
checking that the script contains exactly two pushes, as the low-level encoding
of a <72 byte push is a single byte encoding the length as an unsigned 8-bit
integer, followed by the bytes being pushed.
Validating Witness Programs
So what consensus-critical code uses IsWitnessProgram()
? Interestingly, only
CountWitnessSigOps()
and VerifyScript()
- the latter being the main
entry point to script verification, and significantly modified
by segwit.
Let’s look at what those modifications do. Unlike other examples we’ll use something akin to diff format to clearly indicate what’s new, while still looking at the whole function. This is important, as we have to ensure that the original logic remains unchanged for segwit to be a soft-fork:
VerifyScript()
gains a new argument for the witness, which we’re setting to
“empty” if not provided (presumably because the transaction didn’t have one).
However, the fact that we do this has a rather odd result: a transaction
spending a witness output with an unknown version is valid even if the
transaction doesn’t have any witnesses!
Everything above is unchanged, as we can’t modify how normal scripts are handled in a soft-fork; to the above witness program commitment scripts are just two pushes in a row.
Next we deal with the case where a scriptPubKey
is a bare witness program
commitment, a
Pay-To-Witness-Script-Hash
(P2WSH):
Mostly fairly simple; we’ll cover what VerifyWitnessProgram()
does later.
However there’s a really big red-flag: at the end we resize the (legacy) stack
to bypass the cleanstack check (which shouldn’t be set in consensus).
We didn’t give VerifyWitnessProgram()
the stack, so if not for that we’d be
certain this is a soft-fork, as the behavior to the following code would have
been unchanged. A possibly better way of doing this that was guaranteed to be a
soft-fork would have been to use a boolean flag instead, leaving the stack
unchanged.
More generally, the logic in VerifyScript()
should have been moved to
LegacyVerifyScript()
, with the former calling the legacy verification, and
witness verification, in parallel to ensure it’s a soft-fork. But that’s
complicated by the fact that the way P2SH was implemented made the exact same
mistake…
Again, the legacy P2SH logic above is unchanged in segwit, modulo the following additional rules for the backwards compatibility P2WSH nested in P2SH mode:
Pretty much a carbon-copy of the bare witness program case, modulo the fact
that the anti-malleability check becomes a bit more complex. Note that the name
of the pubKey2
variable dates back to the original P2SH implementation; these
days we’d it a redeemScript
.
Next we have some non-consensus-critical code related to the cleanstack check, which is used by the mempool to prevent non-miners from adding additional junk to scriptSigs:
Finally the last thing we do above is make sure that if a witness program wasn’t used, the witness data is empty:
This prevents miners from attaching
witness data to inputs that don’t actually use witness programs, a potential
nuisance for users. Oddly though, the check is disabled if the script
verification flag SCRIPT_VERIFY_WITNESS
is not set, which means that
VerifyScript()
passes even if there’s witness data! Smells like a potential
bug… we could probably use at least an assertion that witness is null if
SCRIPT_VERIFY_WITNESS
isn’t set.
Now let’s look at how witnesses themselves are evaluated, which is done with
the all-new
VerifyWitnessProgram()
.
Going into here remember that VerifyWitnessProgram()
is called from exactly
two places - the bare P2WSH codepath, and the witness-in-P2SH codepath - both
of which provide the witness version and so-called “program” (AKA the hash of
the actual script):
Similar to how P2SH works, here we’ve used the top item in the per-input
witness as a scriptPubKey
, made sure it’s SHA256 hash matches the program
commitment, and saved the rest of the witness data for use as a stack.
So then what the heck is this?! Bizzarely segwit has an additonal pay-to-witness-pubkey-hash P2WPKH that lets you use a 160-bit (20 byte) commitment, but only if you want to use a single pubkey. As-of-writing, BIP141 doesn’t explain why you’d special case that - the obvious thing to do is give users the option instead to choose to accept the less secure 160-bit commitment if their use-case doesn’t need the full 256-bit security level (IE if they aren’t worried about collision attacks).
Secondly, if you are going to give a 160-bit commitment option, you don’t need
the extra level of indirection in the P2SH case: just make the segwit
redeemScript
be:
<version> <serialized witness script>
Basically the witness program format, but with the full script rather than the hash of the script. The only downside is the serialized witness script is constrained to 520 bytes max, but we enforce that for version 0 witness programs anyway.
Having said that, a plausible argument against the 160-bit mode is that the second level of indirection may make collision attacks more difficult, by forcing the attack to go through a 256-bit hash. But if this is the case, the rational should be documented in the BIPs.
Next we handle higher version witnesses, our upgrade path:
Everything after this point applies only to version 0 witness programs. A style/maintainability thing that bothers me about this is when it comes time to create witness script version #1, there’s a good chance we’ll be doing something sufficiently different (like MAST) that the following code will start to sprout version-specific flags and the like; for consensus code it’d probably be better if we just had separate functions for the different versions to avoid accidentally hard-forking.
Note how we’re only applying the MAX_SCRIPT_ELEMENT_SIZE
size to the stack,
so we don’t repeat P2SH’s 520-byte limitation on redeemScript
size; the only
limit is the 10,000 byte limitation EvalScript()
applies to all scripts.
Speaking of, here’s where the script is actually evaluated, using the witness
as stack; we’ll cover the witness-specific EvalScript()
changes later.
Unlike normal script evaluation, this cleanstack behavior is consensus enforced, making it (usually) impossible for miners to arbitrarily add extra witness data to transactions. Not a malleability concern, but still a nuisance.
BIP143: Transaction Signature Verification
All of the above was ultimately just leading to an EvalScript()
call; let’s
look at the segwit-specific changes made to it by BIP143.
One minor thing, is segwit validation removes the archaic calls to
FindAndDelete()
from
CHECKSIG(VERIFY)
and
CHECKMULTISIG(VERIFY)
.
And good riddance: FindAndDelete()
leads to some really
bizzare
and totally useless special cases. Secondly, unlike the existing signature
hashing
scheme
OP_CODESEPARATOR
isn’t removed from the script that the signature hashes,
another archaic oddity with no apparent rational.
The big change though is that the signature hashing scheme has a new
mode
for segwit, that aims to fix the O(n²) scaling of the original algorithm
by allowing all signatures in a transaction to share the same cached hash of
transaction wide data like the set of inputs and outputs. Additional, for the
sake of resource constrained environments like hardware wallets, signatures now
sign the amount each input spends. While none of these changes actually need
segwit functionality per-se to implement - the new signature scheme could be
implemented by redefining an OP_NOPx
too - segwit is a good opportunity to do
this as it lets us change the behavior of CHECKSIG
and co in a soft-fork.
Let’s look at the code that implements the new scheme; I’m assuming you’ve already familiarized yourself with the description of the algorithm in the BIP.
We start off with null commitments for the input, output, and sequence number sets.
ANYONECANPAY
means we’re only signing for this input, and not others,
allowing additional inputs to be added to the transaction later without
invalidating the signature. The most famous usecase for ANYONECANPAY
is
Hearn’s crowdfunding protocol Lighthouse; the new signature hashing scheme
doesn’t change the meaning of this flag.
If the ANYONECANPAY
flag is set, hashPrevouts
will be null, however that’s
ok because we hash in this specific prevout later.
While caching isn’t actually implemented yet, there is an open pull-req that does implement it, and I don’t see any reason why it wouldn’t work.
In addition to signing which outputs are spent, we also sign the sequence
numbers of the input. ANYONECANPAY
obviously shouldn’t sign sequence numbers
for other inputs, and as above it’s ok that we sign none in that case here
because we sign nSequence
again later.
The SINGLE
and NONE
modes have historically left other inputs’ sequence
numbers unsigned as well. It’s not clear if there’s a good reason for doing
that - Satoshi apparently put that into the protocol as part of the broken way
transaction replacement used to work - but it’s reasonable to continue that
behavior rather than re-thinking how all this should work right now.
Also, note how we’ve repeated the way the (nHashType & 0x1f)
mask, also in
the old signature hash algorithm. That means that the flag bits #5 and #6 are
left undefined, and can be set to anything. OTOH, using this for soft-fork
upgrades is quite hard, as such signatures still act normally; there’s nowhere
to add another commitment. Anyway, the cleaner option for adding new sighash
features is to bump the witness program version.
Next we hash the outputs:
The above is the standard case, with neither SINGLE
nor NONE
set, where we
simply hash all the outputs.
With SINGLE
we only hash (and thus sign) a single output. Note that
hashOutputs
is indistinguishable in the normal with one output, and SINGLE
with many outputs cases - that would be a vulnerability if not for the fact
that the flags are themselves hashed later. We’ve also fixed the
SIGHASH_SINGLE
bug by making it revert to NONE
behavior - arguably less
user-friendly than an actual error, but a competent signer can easily detect
that case anyway.
Something we have changed is that SINGLE
no longer commits to the specific
index of the output it’s signing; the old signature hashing scheme did that in
a weird, implicit, way because it acted as though you were signing a
transaction with all prior outputs set to empty scripts and zero-value outputs,
and all subsequent outputs removed. If ANYONECANPAY
is not used in addition
to SINGLE
, the old behavior was redundant: the index of the output is
committed by the inputs anyway.
Meanwhile, SINGLE
combined with ANYONECANPAY
is a niche use-case. If the
output value is greater than the input, you’re basically saying “if anyone
wants this input to be spent, that’s ok with me, and you need to pay me for the
privilege”. Why would someone want to do that? Protocols like colored coins
come to mind, where the act of spending a txout can have a value beyond the
value of the Bitcoin’s themselves. I believe that the new behavior is a (minor)
improvement over the status quo for that use-case, as it allows multiple sales
of a colored to be combined into a single transaction - useful for a
decentralized
exchange.
If the output value is less than the input, you’re basically saying “I want this output to exist, and don’t care if the left over bitcoins go to miners or another output”. It’s hard to come up with use-cases for this - signature combining schemes come to mind modulo the need for a change output - but if you did need that functionality for some reason, I don’t see why it’d ever be a bad thing to be able to combine multiple such inputs into a single transaction.
Next we combine the hashes for inputs, outputs, and sequence numbers, with everything else that’s signed. With regard to scaling, note how everything that follows has a constant maximum size regardless of how large the transaction is, and everything we hashed above was also either constant size, or the same for every input. The only variable is the size of the per-input script, which is accounted for with the sigop limit:
Red flag: we haven’t explicitly ensured that signatures for the new signature
hash can’t be reused for the old signature hash. While unlikely to be an issue
here, a slightly better design would have been to insert a dummy 0x00
byte
just after nVersion
, to absolutely guarantee that new-style signatures would be
signing data that in the context of the old signature algorithm would be an
invalid transaction with no inputs. Secondly, for further upgrades, add an
additional n-bit random tag, which you could change the next time the
signature algorithm is changed.
Equally, note how the witness program version is only indirectly signed, via the prevout:
scriptCode
is the subset script being validated, starting from the most
recently executed OP_CODESEPARATOR
; we’re repeating an archaic design element
of the old signature hashing algorithm, that was made irrelevant years ago.
Prior to that separation, OP_CODESEPARATOR
could have been (sort-of) used for
delegation by providing additional (signed) code in the scriptSig
, but that
was made impossible when scriptSig
and scriptPubKey
execution were
separated. Now it’s redundant, as the prevout
commits to the scriptPubKey
anyway. Leading this off might have been better, as we’ll see next:
Hashing only the amount spent by this input isn’t ideal: we don’t know the
total amount spent by the transaction unless we can verify all signatures,
which requires a significant amount of data, including all other scripts due to
how the above commits to scriptCode
. And of course, that’s only true if every
input uses segwit.
A better design would have been to treat the amounts in the same way as other input data, like the prevouts/sequence numbers. Fortunately in most cases this isn’t an issue as all inputs are being spent by the same entity, or equally, for the inputs you have spent you know what your total contribution to the transaction fees is. However it still makes it very difficult to, for instance, have a hardware wallet’s UI guarantee the total fee of a transaction partly funded with inputs signed by a lesser security “hot wallet”.
I personally argued a few months ago that segwit should hash transactions with merkle trees; had we done that the above would have been a simple matter of including amounts in the tree. But alas, I raised my objections too late into the process, so I decided to drop the objection for now.
No issues here, and pretty much a direct copy of the old signature algorithm.
It’s worth noting here that while the above fixes the O(n²) scaling problem for segwit transactions, it doesn’t do anything to fix the problem with existing transactions; post-segwit transactions can still be created that take excessively long amounts of time to validate. We’ll still need to eventually fix that problem at a later date, although with segwit we can do so more aggressively - e.g. by limiting maximum transaction size/hashed data - as users who do need large transactions have the option of using segwit instead.
Resource Limits
This
commit
implements the new consensus-critical limits on maximum serialized size of
blocks, and the new ways sigops are counted. In summary, in addition to prior
limits we have a new total blocksize limit - including witness data - of 4MB,
with non-witness data costing 4x more than witness data. Equally, the sigops
limit is quadrupled, but non-witness sigops usage now cost 4x more than
before. This quadrupling is defined by the new constant,
WITNESS_SCALE_FACTOR
.
There’s been a push to think about resource limits in terms of generic “costs”,
combining all relevant costs as a linear function, rather than specific things
like size. Along those lines segwit now refers to a block’s “cost” in many
places, such as the constant
MAX_BLOCK_COST
and the function
GetBlockCost()
.
However, what’s actually in the protocol is that “cost” is purely a function of
serialized size in bytes, with a separate sigops limit.
Let’s look at how these new resource limits are implemented in detail:
Blocksize
MAX_BLOCK_SIZE
is
renamed
to MAX_BLOCK_BASE_SIZE
, with all consensus-critical places where it was
previously used remaining otherwise unchanged. Notably, this includes the
previous block size limit check, done in a way that checks the limit against
non-witness data
only,
ensuring that our new blocksize limit is a soft-fork.
Witness block “cost” is then computed by the new function, GetBlockCost()
:
The (WITNESS_SCALE_FACTOR - 1)
term has confused a lot of people: what’s
going on there is the second term, ::GetSerializeSize(block, SER_NETWORK,
PROTOCOL_VERSION)
, inherently also includes all non-witness data as well, so
the - 1
is there to avoid counting that data twice. Multiplication rather
than division is used, because exactly how division results are rounded would
be consensus critical; it’s far simpler to completely avoid that question and
use multiplication instead.
GetBlockCost()
is used exactly
once,
within ContextualCheckBlock()
to limit maximum block “cost”:
Note how this check is applied even if segwit isn’t enabled; it’s critical
that GetBlockCost(block) > MAX_BLOCK_COST
be false for all blocks that would
pass under the existing rules, or we may end up accidentally forking prior to
segwit being enabled. I believe that is true for the
following reasons:
-
The witness mode serialized representation of a block containing no witness data is identical to the serialized representation with
SERIALIZE_TRANSACTION_NO_WITNESS
set because transactions without witnesses don’t use the extended format. -
Earlier in
ContextualCheckBlock()
we reject blocks that contain witness data when segwit is not enabled.
However it would be safer if the check was only applied if segwit was
enabled. In the future we may run into this same issue if we
extend the notion of block cost with other types of costs; GetBlockCost()
might not be a worthwhile abstraction.
Sigops
When P2SH was added to Bitcoin, it was soft-forked in by adding sigops in P2SH
redemption scripts to the existing legacy sigop limit; the additional sigops
can only make a previous valid block invalid. P2SH implemented this with a new
GetP2SHSigOpCount()
function; the segwit equivalent is
CountWitnessSigOps()
,
which in turn calls
WitnessSigOps()
.
Let’s start with the latter, the generic definition of sigops for a witness script, regardless of how it’s committed:
Remember that 20 byte witness programs are the weird pay-to-witness-pubkey-hash
special case, which expands to a standard pay-to-pubkey-hash script. That’s a
single CHECKSIG
operation, or one sigop.
Otherwise we count the sigops in the witness script, which is the first item in the witness stack:
If witversion
is unknown, we return zero; future upgrades would return >=
zero, which is a soft-fork:
But we also return zero if the witness stack is invalid, and the program corresponds to a pay-to-witness-script-hash. Equally in the P2WPKH special case, we return one. Both cases are kinda odd, as we’re returning a value for a witness stack that we know is invalid - sure, we should fail later when the witness is actually evaluated, but it’s a weird edge-case all the same.
Next we have CountWitnessSigOps()
, which calls the above after extracting the
witness program from the legacy scriptPubKey
or P2SH redemption script:
Unsurprisingly, if segwit isn’t enabled, witness sigops aren’t a thing. Secondly, segwit depends on P2SH.
Note how we’ll even provide WitnessSigOps()
with a empty witness if none was
provided, yet we can only reach this line if segwit is enabled - again we’re
returning a result for a case that we know is invalid.
We do this for witness-in-P2SH too:
With witness in P2SH, while we use the standard IsPayToScriptHash()
and
IsPushOnly()
methods used elsewhere to detect P2SH, we end up writing our own
code to extract the redemption script. The way we’ve done this is totally
different from how P2SH evaluation works elsewhere: in EvalScript()
we
actually execute the script, and use whatever is on the stack, while here we
use the last push. Since P2SH requires scriptSig
to be push-only, the two
should be identical… but in fact they aren’t! IsPushOnly()
considers
OP_1
to OP_16
and OP_1NEGATE
to be push-only, and even more bizarrely,
OP_RESERVED
too.
Fortunately this isn’t relevant to us, because a witness program has to have two pushes, but it’s close call.
Sigop enforcement is actually done in
ConnectBlock()
.
Segwit refactored this code, moving part of the calculation into GetTransactionSigOpCost()
, so let’s start by looking at the relevant parts of
the diff between the last pre-segwit
commit
and the merge
commit
to make sure we know what should be in that function:
Regardless of whether or not the transaction is a coinbase transaction, add the
legacy sigops count to the sum9. While we fail early at this
point, ConnectBlock()
only (should!) modify any state well after the sigops
code changes, so it’s ok to fail later (so long as we do in fact fail!).
Next sum P2SH sigops, in a branch that wasn’t applied to coinbase transactions. Again, note how we’ve changed where the function can fail:
Sum up sigops with the new GetTransactionSigOpCost()
,
potentially failing if the sigops limit is reached:
Now let’s examine GetTransactionSigOpCost()
to make sure the refactored code is equivalent:
Here we succesfully handle the legacy sigops count, both non-coinbase and
coinbase cases; the latter can return early as a coinbase transaction has no
inputs, and thus can’t have P2SH or witness sigops. The sigops count is
multiplied by WITNESS_SCALE_FACTOR
, which is ok because
MAX_BLOCK_SIGOPS_COST
is exactly equal to old sigops count times that scale
factor.
P2SH count, done on the whole transaction in one go. In the old
ConnectBlock()
this code was called if fStrictPayToScriptHash
was true;
SCRIPT_VERIFY_P2SH
is set if fStrictPayToScriptHash
is true, so the above
is equivalent.
Finally we sum up witness sigops, this time on a per-input basis:
While CountWitnessSigOps()
is called whether or not there is a witness for that
input, doing so makes sense in a perverse way: as we mentioned earlier,
VerifyScript()
attempts to verify witness programs whether or not there’s
actually a witness for the input.
Peer-to-Peer Networking
BIP141
specifies the changes made to the P2P layer for segwit. The sending side is
very simple: segwit defines a new serialization for transactions with witnesses
attached, we advertise a new service bit NODE_WITNESS
if we support segwit,
and we respond to the new inv types MSG_WITNESS_TX
and MSG_WITNESS_BLOCK
with witness serialized transations and blocks respectively. I won’t go into
the details of exactly how it’s done - it’s a dozen or so small, mechanical,
changes - but you can see for yourself here.
The receiving side however is more subtle. It’s critical that we succesfully
find and connect to segwit enabled peers; if we fail to do this we simply won’t
be able to get the witness data needed to fully validate. To that end if the
segwit soft-fork is defined
we add NODE_WITNESS
to nRelevantServices
, which is used in the connection
code to determine what peers we try to connect to first; we won’t try
connecting to nodes that aren’t advertising nRelevantServices
until after 40
failed attempts.
Secondly, we have a new function,
GetFetchFlags()
,
which is used to determine whether or not to request witness-serialized data
when responding to an inv message, based on whether or not segwit is active, and whether or not our peer is advertising NODE_WITNESS
.
Here’s the relevant diff from ProcessMessage()
where we make use of it:
So with transactions, if segwit is active, and our peer is advertising
NODE_WITNESS
, we’re asking for witness transactions. On the sending side if a
peer asking for witness transaction prior to segwit being activated is
harmless, so possibly the code could be simplified slightly (but there may be a
reason I’m missing here).
Next we handle blocks:
Basically, what’s changed here is that after segwit activates, we only fetch headers from non-segwit peers, not blocks. Secondly, you’ll node how the compact block support is disabled by segwit; that’s from this commit, and I’m sure we’ll see that fixed soon.
Remember though that a peer can still give us a non-witness block, hopefully because it’s a block that just doesn’t have any witnesses; possibly maliciously by taking a witness block and stripping off the witnesses. We already covered what happens there when we discussed how the witness commitment hash was validated, in particular the corruption possible validation state; totally missing witness data is just a special case of the corrupted witness data, and result in a commitment hash mismatch (except of course if the block genuinely doesn’t have any witness data!).
However, while the mempool isn’t consensus critical, I did notice that it looks like we don’t always handle missing/corrupted witness data correctly in transactions. If I’m correct, the problem here is ironically the fact that txids aren’t malleable: the P2P protocol and implementation is all based on txids, which don’t change if the witness data is changed. This is a problem, because in many cases if we request a transaction from a peer, and we don’t accept it for some reason, we won’t re-request that txid again. Not a problem if the mapping of transaction data to txid is one-to-one, but segwit makes that a many-to-one relationship…
Some of the time we handle this correctly, like when
AcceptToMemoryPoolWorker()
checks input signatures,
correctly setting the corruption possible state:
But we get it wrong in a number of other places elsewhere:
Remember that a transaction can be non-standard because it’s too big,
which an attacker can do by adding junk to the transaction’s witnesses. The
consequence is the transaction gets added to recentRejects
:
This isn’t a major problem - recentRejects
is cleared every time the chain tip changes - but an attacker
could still screw with transaction propagation temporarily. I’m not sure what
the right solution is though, as just changing everything to set corruption
possible on error isn’t ideal; possibly the P2P protocol needs be changed to be
based on wtxids instead. Edit: Confirmed by Suhas Daftuar, who is working on a
fix.
Summary
-
Segwit has a number of non-ideal warts at the consensus protocol level, mostly stuff another few months of development time and review could have ironed out (or resulted in a lot of arguments about bikesheds).
-
There’s a few “close calls” in the consensus-critical implementation that could lead to bugs later, but don’t appear to be actual issues right now. Put another way, in a number of places we either have a belt, or suspenders, when given the importance of this code we’d rather have both a belt and suspenders.
-
Having said that, there doesn’t seem to be anything seriously wrong with the current consensus implementation or protocol design; at worst we’ve added a bit of technical debt, in exchange for getting critically needed functionality out the door.
-
There
does appear to beis a nuisance problem with the non-consensus transaction propagation part of the P2P protocol, though fixing it may require a rethink of how transaction invs work. Edit: Confirmed by Suhas Daftuar, who is working on a fix. -
We haven’t actually fixed the O(n²) signature hashing problem yet, although we’re fairly confident that we can, and there’s a open pull-req implementing the cache that we need. edit: btcd has also implemented the signature cache as part of their segwit implementation.
Footnotes
-
From the point of view of transaction hashing, the so-called
flag
field is more of a version number that can create new definitions of what awtxid
commits too. ↩ -
In a clean-sheet redesign of the Bitcoin protocol you’d want to use tagged hashing, where the hash functions are “tweaked” with a unique tag that’s different for every type of data structure that might be hashed. ↩
-
Incidentally, I think it’s generally a good idea to do this: you really don’t want to accidentally leave something important out of a hash calculation, as anything not hashed is uncommitted data that an attacker can forge at will. ↩
-
Interestingly, you could imagine a design where blocks didn’t commit to the witnesses: witnesses would still required to prove that a block was in fact valid, but exactly which witnesses were needed wouldn’t be set in stone. But there’s no real advantage to such a design, and it’d add a whole new level of complexity to debugging when things went wrong as it’d be possible that different parts of the network were looking at different witnesses, potentially triggering different bugs. ↩
-
For the record, the double-hashing is one of a number of flaws I raised a few months ago with segwit, which others’ thought was too late to be worth fixing. ↩
-
ContextualCheckBlock()
is also called by the mining code via theTestBlockValidity()
function, but that’s not relevant to consensus. ↩ -
The duplicate txid issue with how the merkle tree is calculated that we discussed earlier is another example of this problem. ↩
-
I was sure I remembered seeing a patch from Pieter Wuille fixing this problem a few months ago; not sure what happened to it. ↩
-
Of course, a coinbase transaction doesn’t spend any inputs, and thus doesn’t evaluate any scripts… So why would a coinbase transaction have sigops? Unfortunately, Satoshi’s original sigops design is broken and completely backwards: rather than counting the sigops that are actually evaluated in the inputs, it’s the unevaluated outputs that count towards the sigops limit! ↩