Code Review: The Consensus Critical Parts of Segwit

With segwit merged I thought it’d be good to do a final code review of the consensus-critical parts of the segwit codebase, and in the process of doing so, also explain in detail about how it all works. By “consensus-critical” we mean all the code that determines whether or not a block is valid, which is defined by BIP141. Additionally in practice P2P networking code is semi-consensus-critical, because a sufficiently broken network can prevent consensus by preventing consensus-relevant data from being propagated; networking is covered by BIP144.

I’m assuming you’re already familiar to segwit at a high level; if not I’d start by reading the FAQ.

Transaction Serialization And Hashing
1. Witness Transaction Identifier
2. Implementation
Commitment Structures
Verification
Resource Limits
1. Blocksize
2. Sigops
Peer-to-Peer Networking
Summary
Footnotes

Transaction Serialization And Hashing

Witness Transaction Identifier

Conceptually speaking the main thing segwit does is it adds an additional type of transaction id, the witness txid (wtxid). We’ll discuss how it’s used later; it’s defined as follows:

[nVersion][marker][flag][txins][txouts][witness][nLockTime]

The marker is the byte 0x00, and the flag is (currently¹)0x01. Recall that a txid is hashed as follows:

[nVersion][txins][txouts][nLockTime]

…with txins serialized as a compact size encoded integer of the number of transaction inputs, followed by zero or more transaction inputs. This means that if a sequence of bytes is a valid for the purpose of calculating a wtxid, if you attempted to reuse those same bytes to calculate a txid it’d be guaranteed to commit to a transaction with no inputs.

tl;dr: It’s not possible to confuse a wtxid for a txid, except maybe in the degenerate case of a tx with zero inputs, which is never valid anyway (coinbase transactions have exactly one null input). In general different types of things having the same identifier is one of those things that often leads to problems, so it’s good to see this clearly prevented².

The witness is a bit interesting. Each input is associated with witness data, which is in the form of an array of byte arrays, the “stack”. The byte arrays are variable size, so what’s actually hashed for witness is:

{ <# of witness stack items>{<# of bytes><bytes> , ...}, ... }

However the actual size of the witness is not hashed because it’s always equal to the number of transaction inputs; non-witness inputs are required to set the number of witness stack items to zero.

Implementation

Bitcoin Core has always reused serialization code for hashing³, and segwit continues that practice. We have three new classes for witness data, CScriptWitness, CTxinWitness, and CTxWitness. Transaction serialization code is then refactored into SerializeTransaction():

inline void SerializeTransaction(TxType& tx, Stream& s, Operation ser_action, int nType, int nVersion) {
    READWRITE(*const_cast<int32_t*>(&tx.nVersion));

Note how the nVersion argument is not the same thing as the transaction version.

Due to how the serialization code works elsewhere, serialization and deserialization are combined, starting with the the latter:

    unsigned char flags = 0;
    if (ser_action.ForRead()) {
        const_cast<std::vector<CTxIn>*>(&tx.vin)->clear();
        const_cast<std::vector<CTxOut>*>(&tx.vout)->clear();
        const_cast<CTxWitness*>(&tx.wit)->SetNull();
        /* Try to read the vin. In case the dummy is there, this will be read as an empty vector. */
        READWRITE(*const_cast<std::vector<CTxIn>*>(&tx.vin));
        if (tx.vin.size() == 0 && !(nVersion & SERIALIZE_TRANSACTION_NO_WITNESS)) {
            /* We read a dummy or an empty vin. */
            READWRITE(flags);
            if (flags != 0) {
                READWRITE(*const_cast<std::vector<CTxIn>*>(&tx.vin));
                READWRITE(*const_cast<std::vector<CTxOut>*>(&tx.vout));
            }

The serialization version flag SERIALIZE_TRANSACTION_NO_WITNESS means “serialize/deserialize a transaction without witness data”. So we’ll be in the above branch only during witness tx deserialization.

        } else {
            /* We read a non-empty vin. Assume a normal vout follows. */
            READWRITE(*const_cast<std::vector<CTxOut>*>(&tx.vout));
        }

The above comment is actually wrong: we can also reach that branch if we attempt to deserialize a transaction with an empty vin in witness mode.

        if ((flags & 1) && !(nVersion & SERIALIZE_TRANSACTION_NO_WITNESS)) {
            /* The witness flag is present, and we support witnesses. */
            flags ^= 1;
            const_cast<CTxWitness*>(&tx.wit)->vtxinwit.resize(tx.vin.size());
            READWRITE(tx.wit);
        }
        if (flags) {
            /* Unknown flag in the serialization */
            throw std::ios_base::failure("Unknown transaction optional data");
        }

What’s called flags above was called flag in the (current version of) BIP141 that we discussed in the last section. Since an unknown flag errors out, we have a way to extend the serialization format in the future with not-currently-committed data that’s guaranteed to be rejected by older nodes - that’s actually a good thing as non-committed data can’t be verified by old nodes, so allowing anything at all can be used as a DoS attack (as happened before in CVE-2013-4627).

Next we have serialization/hashing:

    } else {
        // Consistency check
        assert(tx.wit.vtxinwit.size() <= tx.vin.size());
        if (!(nVersion & SERIALIZE_TRANSACTION_NO_WITNESS)) {
            /* Check whether witnesses need to be serialized. */
            if (!tx.wit.IsNull()) {
                flags |= 1;
            }
        }
        if (flags) {
            /* Use extended format in case witnesses are to be serialized. */
            std::vector<CTxIn> vinDummy;
            READWRITE(vinDummy);
            READWRITE(flags);
        }

Note how the fact that an actual dummy transaction input is written - rather than the bytes 0x00 - could become relevant if something else changed how transaction inputs are serialized.

        READWRITE(*const_cast<std::vector<CTxIn>*>(&tx.vin));
        READWRITE(*const_cast<std::vector<CTxOut>*>(&tx.vout));
        if (flags & 1) {
            const_cast<CTxWitness*>(&tx.wit)->vtxinwit.resize(tx.vin.size());
            READWRITE(tx.wit);
        }
    }
    READWRITE(*const_cast<uint32_t*>(&tx.nLockTime));
}

Resizing the witnesses to the size of the inputs seems a bit dubious to me; probably better if witness-using transactions were required to always have one witness per txin to be considered valid. It also means that SerializeTransaction() - and in turn the serialization functions that call it - can’t take const transactions.

Commitment Structures

The pull-req’s second commit deals with the data structures needed to commit to witness transactions in blocks, as well as the version bits deployment.

Calculating the Witness Root Hash

In the same way that the merkle root hash commits to transactions, the witness root hash commits to the witnesses⁴ of every transaction (other than the coinbase).

The witness root hash doesn’t just commit to witnesses however: it commits to wtxids, which are entire transactions with witness data. This means that transactions actually get inefficiently hashed twice: once for the merkle tree of transactions, and again for the merkle tree of witness transactions.

While the inefficiency of hashing transactions twice is a minor thing - and can (very rarely) be an optimization for lite clients - but there’s a bigger issue: we now have to verify that both trees hash the same data. In Bitcoin Core this is done implicitly, as we expect to have the full set of transactions and calculate both commitments from them, but it’s easy to see how in other circumstances this opens up a whole new set of ways that miners can produce invalid blocks that implementations fail to detect properly, or handle in inconsistent ways⁵.

Secondly, because the coinbase can’t commit to itself, we’re forced to make the coinbase an exception: its wtxid is assumed to be 0x00...00 for the purpose of calculating the witness root hash.

However if we look at the BlockWitnessMerkleRoot function that calculates the witness merkle root we find another subtle issue:

uint256 BlockWitnessMerkleRoot(const CBlock& block, bool* mutated)
{
    std::vector<uint256> leaves;
    leaves.resize(block.vtx.size());
    leaves[0].SetNull(); // The witness hash of the coinbase is 0.
    for (size_t s = 1; s < block.vtx.size(); s++) {
        leaves[s] = block.vtx[s].GetWitnessHash();
    }
    return ComputeMerkleRoot(leaves, mutated);
}

Pretty straightforward: calculate the wtxid with GetWitnessHash() for every transaction, and then pass those leaves to the generic ComputeMerkleRoot(). But what’s that mutated argument?

Unfortunately, there’s a pretty big flaw in Bitcoin’s merkle tree algorithm: multiple lists of transactions can result in the same merkle root. You can read the gory details here but what that means for us is since we’ve reused the same (flawed) algorithm for the witness root, we have to work around the flaw in the same way.

Right away that prevents us from using the existing merkle tree algorithm in any kind of pure-witness commitment, as the fix requires txids to be unique, and witness data isn’t; hashing the entire transaction guarantees uniqueness (via BIP34).

Committing to the Witness Root Hash

We’ve got a number of constraints in where we put the witness root hash commitment, ranging from the design of the Stratum mining protocol, to p2pool compatibility, to the fact that miners have been selling coinbase space for advertising. And of course, in a soft-fork putting the commitment the coinbase is basically our only option: we’d like it closer to the block header, but that may not even be a lite client soft-fork.

Segwit takes a fairly inclusive approach, with the commitment being a specially formatted OP_RETURN output, which GetWitnessCommitmentIndex() finds the index of:

// Compute at which vout of the block's coinbase transaction the witness
// commitment occurs, or -1 if not found.
static int GetWitnessCommitmentIndex(const CBlock& block)
{
    int commitpos = -1;
    for (size_t o = 0; o < block.vtx[0].vout.size(); o++) {
        if (block.vtx[0].vout[o].scriptPubKey.size() >= 38 && block.vtx[0].vout[o].scriptPubKey[0] == OP_RETURN && block.vtx[0].vout[o].scriptPubKey[1] == 0x24 && block.vtx[0].vout[o].scriptPubKey[2] == 0xaa && block.vtx[0].vout[o].scriptPubKey[3] == 0x21 && block.vtx[0].vout[o].scriptPubKey[4] == 0xa9 && block.vtx[0].vout[o].scriptPubKey[5] == 0xed) {
            commitpos = o;
        }
    }
    return commitpos;
}

That’s then used within ContextualCheckBlock() in the following:

    bool fHaveWitness = false;
    if (IsWitnessEnabled(pindexPrev, consensusParams)) {

Ultimately the above is simply checking the version bits machinery if the segwit soft-fork is enabled.

        int commitpos = GetWitnessCommitmentIndex(block);
        if (commitpos != -1) {
            bool malleated = false;
            uint256 hashWitness = BlockWitnessMerkleRoot(block, &malleated);
            // The malleation check is ignored; as the transaction tree itself
            // already does not permit it, it is impossible to trigger in the
            // witness tree.

The above is referring to how CheckBlock() ensures that the merkle tree hasn’t been malleated with the duplicate txids vulnerability discussed earlier; CheckBlock() guarantees that the transactions list themselves hasn’t been malleated, thus we can calculate the block witness merkle root from that without being concerned about malleation (though when this code was merged CheckBlock() had an incorrect comment stating otherwise). All the same, it’d probably be good to add an assertion here that malleated is false.

            if (block.vtx[0].wit.vtxinwit.size() != 1 || block.vtx[0].wit.vtxinwit[0].scriptWitness.stack.size() != 1 || block.vtx[0].wit.vtxinwit[0].scriptWitness.stack[0].size() != 32) {
                return state.DoS(100, error("%s : invalid witness nonce size", __func__), REJECT_INVALID, "bad-witness-nonce-size", true);
            }
            CHash256().Write(hashWitness.begin(), 32).Write(&block.vtx[0].wit.vtxinwit[0].scriptWitness.stack[0][0], 32).Finalize(hashWitness.begin());

The above hashes the witness merkle root with the so-called “witness nonce” (now called “witness reserve value” in BIP141), which is taken from the coinbase’s witness - the only value the coinbase witness is allowed to have. The coinbase witness isn’t hashed by BlockWitnessMerkleRoot(), so if not for the above its value would be uncommitted (not good!).

Finally, having calculated what the witness commitment should be, we check that what’s in the block is exactly that:

            if (memcmp(hashWitness.begin(), &block.vtx[0].vout[commitpos].scriptPubKey[6], 32)) {
                return state.DoS(100, error("%s : witness merkle commitment mismatch", __func__), REJECT_INVALID, "bad-witness-merkle-match", true);
            }
            fHaveWitness = true;
        }
    }

Interestingly, even after segwit activates, miners are allowed to leave off the witness commitment if their block has no witness data. Since there’s no witness commitment controlling what the witness data is, we need to ensure that there’s no witness data at all to prevent attackers from appending junk witness data to such blocks:

    // No witness data is allowed in blocks that don't commit to witness data, as this would otherwise leave room for spam
    if (!fHaveWitness) {
        for (size_t i = 0; i < block.vtx.size(); i++) {
            if (!block.vtx[i].wit.IsNull()) {
                return state.DoS(100, error("%s : unexpected witness data found", __func__), REJECT_INVALID, "unexpected-witness", true);
            }
        }
    }

So we DoS-ban the node that sent us that malleated block. But what else happens? We’re called by AcceptBlock()⁶ by the following:

    if ((!CheckBlock(block, state, chainparams.GetConsensus(), GetAdjustedTime())) || !ContextualCheckBlock(block, state, pindex->pprev)) {
        if (state.IsInvalid() && !state.CorruptionPossible()) {
            pindex->nStatus |= BLOCK_FAILED_VALID;
            setDirtyBlockIndex.insert(pindex);
        }
        return error("%s: %s", __func__, FormatStateMessage(state));
    }

Note the CorruptionPossible() call - that’s used to indicate that while something is invalid, it may be invalid because one of our peers corrupted the data, not because the thing itself is intrinsically invalid. Without this distinction a peer can trick us into marking blocks permanently invalid, breaking consensus⁷.

Getting this right is subtle: CValidationState::DoS() has an optional (and currently undocumented) argument specifying whether there is or is not potential corruption. We’ve used that argument correctly above, setting corruption possible for all failures of uncommitted witness data, but I’ll bet you didn’t notice the difference between the two types of DoS() calls! It’d probably be a good idea to add a second, differently named, constructor to make the distinction more clear…

Using The Witness Nonce^H^H^H^H^H Reserve Value For Protocol Upgrades

BIP141 describes the witness reserve value as being part of an “extensible commitment structure”, as its meaning can be changed later with a soft-fork to add “new commitment values” to blocks. Unfortunately, as I pointed out six months ago segwit is more dangerous and invasive than normal soft-forks from the point of view of the P2P layer, in that only segwit-aware peers can propagate witness data necessary for consensus. Since the reserve value is only a single, 32-byte value, we’re setting ourselves up for the same problem again⁸.

Verification

Validating Witness Program Scripts

Similar to how P2SH allows a scriptPubKey to be replaced by a hash commitment to the actual redeemScript, segwit allows witness transaction outputs to commit to the actual spend conditions with so-called “witness programs”. These witness programs take of the form of scripts that match the following template:

<OP_0 to OP_16> <2 to 40 byte push>

The idea is that the first push acts as a version number, and the second push is the hash commitment. Normally a hash commitment would be <= 32 bytes, but the last minute it was pointed out that having just 16 more possible script versions might not be enough, so the commitment was extended to 40 bytes.

It’s worth noting that we’re not using the discourage upgradable NOPs mechanism meant to protect non-upgraded miners from accidentally mining transactions made invalid by a soft-fork; instead we’re relying on the cleanstack check. This is relevant in the witness-in-P2SH mode.

To validate that a script is a “witness program”, and to extract the version and commitment, the CScript class gains a new method, IsWitnessProgram(). Let’s look at it:

bool CScript::IsWitnessProgram(int& version, std::vector<unsigned char>& program) const
{
    if (this->size() < 4 || this->size() > 42) {
        return false;
    }

Enforce minimum and maximum sizes. Confusingly, the enforcement is done at the level of the entire script; we’ll see the subtlety this leads to in a moment.

    if ((*this)[0] != OP_0 && ((*this)[0] < OP_1 || (*this)[0] > OP_16)) {
        return false;
    }

Enforce use of one of the standard push-number-to-stack opcodes. This means that in the event that we ever do need more than sixteen witness program versions, we’ll have to use a different version number scheme.

    if ((size_t)((*this)[1] + 2) == this->size()) {
        version = DecodeOP_N((opcodetype)(*this)[0]);
        program = std::vector<unsigned char>(this->begin() + 2, this->end());
        return true;
    }
    return false;
}

The last condition is kinda weird: why are we casting part of the script to a size_t? What’s actually happening here, is in a round-about way we’re checking that the script contains exactly two pushes, as the low-level encoding of a <72 byte push is a single byte encoding the length as an unsigned 8-bit integer, followed by the bytes being pushed.

Validating Witness Programs

So what consensus-critical code uses IsWitnessProgram()? Interestingly, only CountWitnessSigOps() and VerifyScript() - the latter being the main entry point to script verification, and significantly modified by segwit.

Let’s look at what those modifications do. Unlike other examples we’ll use something akin to diff format to clearly indicate what’s new, while still looking at the whole function. This is important, as we have to ensure that the original logic remains unchanged for segwit to be a soft-fork:

-bool VerifyScript(const CScript& scriptSig, const CScript& scriptPubKey, unsigned int flags, const BaseSignatureChecker& checker, ScriptError* serror)
+bool VerifyScript(const CScript& scriptSig, const CScript& scriptPubKey, const CScriptWitness* witness, unsigned int flags, const BaseSignatureChecker& checker, ScriptError* serror)
{
+    static const CScriptWitness emptyWitness;
+    if (witness == NULL) {
+        witness = &emptyWitness;
+    }
+    bool hadWitness = false;

VerifyScript() gains a new argument for the witness, which we’re setting to “empty” if not provided (presumably because the transaction didn’t have one). However, the fact that we do this has a rather odd result: a transaction spending a witness output with an unknown version is valid even if the transaction doesn’t have any witnesses!

     set_error(serror, SCRIPT_ERR_UNKNOWN_ERROR);

     if ((flags & SCRIPT_VERIFY_SIGPUSHONLY) != 0 && !scriptSig.IsPushOnly()) {
        return set_error(serror, SCRIPT_ERR_SIG_PUSHONLY);
     }

     vector<vector<unsigned char> > stack, stackCopy;
     if (!EvalScript(stack, scriptSig, flags, checker, SIGVERSION_BASE, serror))
         // serror is set
         return false;
     if (flags & SCRIPT_VERIFY_P2SH)
         stackCopy = stack;
     if (!EvalScript(stack, scriptPubKey, flags, checker, SIGVERSION_BASE, serror))
         // serror is set
         return false;
     if (stack.empty())
         return set_error(serror, SCRIPT_ERR_EVAL_FALSE);
     if (CastToBool(stack.back()) == false)
         return set_error(serror, SCRIPT_ERR_EVAL_FALSE);

Everything above is unchanged, as we can’t modify how normal scripts are handled in a soft-fork; to the above witness program commitment scripts are just two pushes in a row.

Next we deal with the case where a scriptPubKey is a bare witness program commitment, a Pay-To-Witness-Script-Hash (P2WSH):

+    // Bare witness programs
+    int witnessversion;
+    std::vector<unsigned char> witnessprogram;
+    if (flags & SCRIPT_VERIFY_WITNESS) {
+        if (scriptPubKey.IsWitnessProgram(witnessversion, witnessprogram)) {
+            hadWitness = true;
+            if (scriptSig.size() != 0) {
+                // The scriptSig must be _exactly_ CScript(), otherwise we reintroduce malleability.
+                return set_error(serror, SCRIPT_ERR_WITNESS_MALLEATED);
+            }
+            if (!VerifyWitnessProgram(*witness, witnessversion, witnessprogram, flags, checker, serror)) {
+                return false;
+            }
+            // Bypass the cleanstack check at the end. The actual stack is obviously not clean
+            // for witness programs.
+            stack.resize(1);
+        }
+    }

Mostly fairly simple; we’ll cover what VerifyWitnessProgram() does later. However there’s a really big red-flag: at the end we resize the (legacy) stack to bypass the cleanstack check (which shouldn’t be set in consensus).

We didn’t give VerifyWitnessProgram() the stack, so if not for that we’d be certain this is a soft-fork, as the behavior to the following code would have been unchanged. A possibly better way of doing this that was guaranteed to be a soft-fork would have been to use a boolean flag instead, leaving the stack unchanged.

More generally, the logic in VerifyScript() should have been moved to LegacyVerifyScript(), with the former calling the legacy verification, and witness verification, in parallel to ensure it’s a soft-fork. But that’s complicated by the fact that the way P2SH was implemented made the exact same mistake…

     // Additional validation for spend-to-script-hash transactions:
     if ((flags & SCRIPT_VERIFY_P2SH) && scriptPubKey.IsPayToScriptHash())
     {
         // scriptSig must be literals-only or validation fails
         if (!scriptSig.IsPushOnly())
             return set_error(serror, SCRIPT_ERR_SIG_PUSHONLY);

         // Restore stack.
         swap(stack, stackCopy);

         // stack cannot be empty here, because if it was the
         // P2SH  HASH <> EQUAL  scriptPubKey would be evaluated with
         // an empty stack and the EvalScript above would return false.
         assert(!stack.empty());

         const valtype& pubKeySerialized = stack.back();
         CScript pubKey2(pubKeySerialized.begin(), pubKeySerialized.end());
         popstack(stack);

         if (!EvalScript(stack, pubKey2, flags, checker, SIGVERSION_BASE, serror))
             // serror is set
             return false;
         if (stack.empty())
             return set_error(serror, SCRIPT_ERR_EVAL_FALSE);
         if (!CastToBool(stack.back()))
             return set_error(serror, SCRIPT_ERR_EVAL_FALSE);

Again, the legacy P2SH logic above is unchanged in segwit, modulo the following additional rules for the backwards compatibility P2WSH nested in P2SH mode:

+        // P2SH witness program
+        if (flags & SCRIPT_VERIFY_WITNESS) {
+            if (pubKey2.IsWitnessProgram(witnessversion, witnessprogram)) {
+                hadWitness = true;
+                if (scriptSig != CScript() << std::vector<unsigned char>(pubKey2.begin(), pubKey2.end())) {
+                    // The scriptSig must be _exactly_ a single push of the redeemScript. Otherwise we
+                    // reintroduce malleability.
+                    return set_error(serror, SCRIPT_ERR_WITNESS_MALLEATED_P2SH);
+                }
+                if (!VerifyWitnessProgram(*witness, witnessversion, witnessprogram, flags, checker, serror)) {
+                    return false;
+                }
+                // Bypass the cleanstack check at the end. The actual stack is obviously not clean
+                // for witness programs.
+                stack.resize(1);
+            }
+        }
     }

Pretty much a carbon-copy of the bare witness program case, modulo the fact that the anti-malleability check becomes a bit more complex. Note that the name of the pubKey2 variable dates back to the original P2SH implementation; these days we’d it a redeemScript.

Next we have some non-consensus-critical code related to the cleanstack check, which is used by the mempool to prevent non-miners from adding additional junk to scriptSigs:

     // The CLEANSTACK check is only performed after potential P2SH evaluation,
     // as the non-P2SH evaluation of a P2SH script will obviously not result in
-    // a clean stack (the P2SH inputs remain).
+    // a clean stack (the P2SH inputs remain). The same holds for witness evaluation.
     if ((flags & SCRIPT_VERIFY_CLEANSTACK) != 0) {
         // Disallow CLEANSTACK without P2SH, as otherwise a switch CLEANSTACK->P2SH+CLEANSTACK
         // would be possible, which is not a softfork (and P2SH should be one).
         assert((flags & SCRIPT_VERIFY_P2SH) != 0);
+        assert((flags & SCRIPT_VERIFY_WITNESS) != 0);
         if (stack.size() != 1) {
             return set_error(serror, SCRIPT_ERR_CLEANSTACK);
         }
     }

Finally the last thing we do above is make sure that if a witness program wasn’t used, the witness data is empty:

+    if (flags & SCRIPT_VERIFY_WITNESS) {
+        // We can't check for correct unexpected witness data if P2SH was off, so require
+        // that WITNESS implies P2SH. Otherwise, going from WITNESS->P2SH+WITNESS would be
+        // possible, which is not a softfork.
+        assert((flags & SCRIPT_VERIFY_P2SH) != 0);
+        if (!hadWitness && !witness->IsNull()) {
+            return set_error(serror, SCRIPT_ERR_WITNESS_UNEXPECTED);
+        }
+    }
+
     return set_success(serror);
 }

This prevents miners from attaching witness data to inputs that don’t actually use witness programs, a potential nuisance for users. Oddly though, the check is disabled if the script verification flag SCRIPT_VERIFY_WITNESS is not set, which means that VerifyScript() passes even if there’s witness data! Smells like a potential bug… we could probably use at least an assertion that witness is null if SCRIPT_VERIFY_WITNESS isn’t set.

Now let’s look at how witnesses themselves are evaluated, which is done with the all-new VerifyWitnessProgram(). Going into here remember that VerifyWitnessProgram() is called from exactly two places - the bare P2WSH codepath, and the witness-in-P2SH codepath - both of which provide the witness version and so-called “program” (AKA the hash of the actual script):

static bool VerifyWitnessProgram(const CScriptWitness& witness, int witversion, const std::vector<unsigned char>& program, unsigned int flags, const BaseSignatureChecker& checker, ScriptError* serror)
{
    vector<vector<unsigned char> > stack;
    CScript scriptPubKey;

    if (witversion == 0) {
        if (program.size() == 32) {
            // Version 0 segregated witness program: SHA256(CScript) inside the program, CScript + inputs in witness
            if (witness.stack.size() == 0) {
                return set_error(serror, SCRIPT_ERR_WITNESS_PROGRAM_WITNESS_EMPTY);
            }
            scriptPubKey = CScript(witness.stack.back().begin(), witness.stack.back().end());
            stack = std::vector<std::vector<unsigned char> >(witness.stack.begin(), witness.stack.end() - 1);
            uint256 hashScriptPubKey;
            CSHA256().Write(&scriptPubKey[0], scriptPubKey.size()).Finalize(hashScriptPubKey.begin());
            if (memcmp(hashScriptPubKey.begin(), &program[0], 32)) {
                return set_error(serror, SCRIPT_ERR_WITNESS_PROGRAM_MISMATCH);
            }

Similar to how P2SH works, here we’ve used the top item in the per-input witness as a scriptPubKey, made sure it’s SHA256 hash matches the program commitment, and saved the rest of the witness data for use as a stack.

        } else if (program.size() == 20) {
            // Special case for pay-to-pubkeyhash; signature + pubkey in witness
            if (witness.stack.size() != 2) {
                return set_error(serror, SCRIPT_ERR_WITNESS_PROGRAM_MISMATCH); // 2 items in witness
            }
            scriptPubKey << OP_DUP << OP_HASH160 << program << OP_EQUALVERIFY << OP_CHECKSIG;
            stack = witness.stack;
        } else {
            return set_error(serror, SCRIPT_ERR_WITNESS_PROGRAM_WRONG_LENGTH);
        }

So then what the heck is this?! Bizzarely segwit has an additonal pay-to-witness-pubkey-hash P2WPKH that lets you use a 160-bit (20 byte) commitment, but only if you want to use a single pubkey. As-of-writing, BIP141 doesn’t explain why you’d special case that - the obvious thing to do is give users the option instead to choose to accept the less secure 160-bit commitment if their use-case doesn’t need the full 256-bit security level (IE if they aren’t worried about collision attacks).

Secondly, if you are going to give a 160-bit commitment option, you don’t need the extra level of indirection in the P2SH case: just make the segwit redeemScript be:

<version> <serialized witness script>

Basically the witness program format, but with the full script rather than the hash of the script. The only downside is the serialized witness script is constrained to 520 bytes max, but we enforce that for version 0 witness programs anyway.

Having said that, a plausible argument against the 160-bit mode is that the second level of indirection may make collision attacks more difficult, by forcing the attack to go through a 256-bit hash. But if this is the case, the rational should be documented in the BIPs.

Next we handle higher version witnesses, our upgrade path:

    } else if (flags & SCRIPT_VERIFY_DISCOURAGE_UPGRADABLE_WITNESS_PROGRAM) {
        return set_error(serror, SCRIPT_ERR_DISCOURAGE_UPGRADABLE_WITNESS_PROGRAM);
    } else {
        // Higher version witness scripts return true for future softfork compatibility
        return set_success(serror);
    }

Everything after this point applies only to version 0 witness programs. A style/maintainability thing that bothers me about this is when it comes time to create witness script version #1, there’s a good chance we’ll be doing something sufficiently different (like MAST) that the following code will start to sprout version-specific flags and the like; for consensus code it’d probably be better if we just had separate functions for the different versions to avoid accidentally hard-forking.

    // Disallow stack item size > MAX_SCRIPT_ELEMENT_SIZE in witness stack
    for (unsigned int i = 0; i < stack.size(); i++) {
        if (stack.at(i).size() > MAX_SCRIPT_ELEMENT_SIZE)
            return set_error(serror, SCRIPT_ERR_PUSH_SIZE);
    }

Note how we’re only applying the MAX_SCRIPT_ELEMENT_SIZE size to the stack, so we don’t repeat P2SH’s 520-byte limitation on redeemScript size; the only limit is the 10,000 byte limitation EvalScript() applies to all scripts.

    if (!EvalScript(stack, scriptPubKey, flags, checker, SIGVERSION_WITNESS_V0, serror)) {
        return false;
    }

Speaking of, here’s where the script is actually evaluated, using the witness as stack; we’ll cover the witness-specific EvalScript() changes later.

    // Scripts inside witness implicitly require cleanstack behaviour
    if (stack.size() != 1)
        return set_error(serror, SCRIPT_ERR_EVAL_FALSE);
    if (!CastToBool(stack.back()))
        return set_error(serror, SCRIPT_ERR_EVAL_FALSE);
    return true;
}

Unlike normal script evaluation, this cleanstack behavior is consensus enforced, making it (usually) impossible for miners to arbitrarily add extra witness data to transactions. Not a malleability concern, but still a nuisance.

BIP143: Transaction Signature Verification

All of the above was ultimately just leading to an EvalScript() call; let’s look at the segwit-specific changes made to it by BIP143.

One minor thing, is segwit validation removes the archaic calls to FindAndDelete() from CHECKSIG(VERIFY) and CHECKMULTISIG(VERIFY). And good riddance: FindAndDelete() leads to some really bizzare and totally useless special cases. Secondly, unlike the existing signature hashing scheme OP_CODESEPARATOR isn’t removed from the script that the signature hashes, another archaic oddity with no apparent rational.

The big change though is that the signature hashing scheme has a new mode for segwit, that aims to fix the O(n²) scaling of the original algorithm by allowing all signatures in a transaction to share the same cached hash of transaction wide data like the set of inputs and outputs. Additional, for the sake of resource constrained environments like hardware wallets, signatures now sign the amount each input spends. While none of these changes actually need segwit functionality per-se to implement - the new signature scheme could be implemented by redefining an OP_NOPx too - segwit is a good opportunity to do this as it lets us change the behavior of CHECKSIG and co in a soft-fork.

Let’s look at the code that implements the new scheme; I’m assuming you’ve already familiarized yourself with the description of the algorithm in the BIP.

uint256 SignatureHash(const CScript& scriptCode, const CTransaction& txTo, unsigned int nIn, int nHashType, const CAmount& amount, SigVersion sigversion)
{
    if (sigversion == SIGVERSION_WITNESS_V0) {
        uint256 hashPrevouts;
        uint256 hashSequence;
        uint256 hashOutputs;

We start off with null commitments for the input, output, and sequence number sets.

        if (!(nHashType & SIGHASH_ANYONECANPAY)) {
            CHashWriter ss(SER_GETHASH, 0);
            for (unsigned int n = 0; n < txTo.vin.size(); n++) {
                ss << txTo.vin[n].prevout;
            }
            hashPrevouts = ss.GetHash(); // TODO: cache this value for all signatures in a transaction
        }

ANYONECANPAY means we’re only signing for this input, and not others, allowing additional inputs to be added to the transaction later without invalidating the signature. The most famous usecase for ANYONECANPAY is Hearn’s crowdfunding protocol Lighthouse; the new signature hashing scheme doesn’t change the meaning of this flag.

If the ANYONECANPAY flag is set, hashPrevouts will be null, however that’s ok because we hash in this specific prevout later.

While caching isn’t actually implemented yet, there is an open pull-req that does implement it, and I don’t see any reason why it wouldn’t work.

        if (!(nHashType & SIGHASH_ANYONECANPAY) && (nHashType & 0x1f) != SIGHASH_SINGLE && (nHashType & 0x1f) != SIGHASH_NONE) {
            CHashWriter ss(SER_GETHASH, 0);
            for (unsigned int n = 0; n < txTo.vin.size(); n++) {
                ss << txTo.vin[n].nSequence;
            }
            hashSequence = ss.GetHash(); // TODO: cache this value for all signatures in a transaction
        }

In addition to signing which outputs are spent, we also sign the sequence numbers of the input. ANYONECANPAY obviously shouldn’t sign sequence numbers for other inputs, and as above it’s ok that we sign none in that case here because we sign nSequence again later.

The SINGLE and NONE modes have historically left other inputs’ sequence numbers unsigned as well. It’s not clear if there’s a good reason for doing that - Satoshi apparently put that into the protocol as part of the broken way transaction replacement used to work - but it’s reasonable to continue that behavior rather than re-thinking how all this should work right now.

Also, note how we’ve repeated the way the (nHashType & 0x1f) mask, also in the old signature hash algorithm. That means that the flag bits #5 and #6 are left undefined, and can be set to anything. OTOH, using this for soft-fork upgrades is quite hard, as such signatures still act normally; there’s nowhere to add another commitment. Anyway, the cleaner option for adding new sighash features is to bump the witness program version.

Next we hash the outputs:

        if ((nHashType & 0x1f) != SIGHASH_SINGLE && (nHashType & 0x1f) != SIGHASH_NONE) {
            CHashWriter ss(SER_GETHASH, 0);
            for (unsigned int n = 0; n < txTo.vout.size(); n++) {
                ss << txTo.vout[n];
            }
            hashOutputs = ss.GetHash(); // TODO: cache this value for all signatures in a transaction

The above is the standard case, with neither SINGLE nor NONE set, where we simply hash all the outputs.

        } else if ((nHashType & 0x1f) == SIGHASH_SINGLE && nIn < txTo.vout.size()) {
            CHashWriter ss(SER_GETHASH, 0);
            ss << txTo.vout[nIn];
            hashOutputs = ss.GetHash();
        }

With SINGLE we only hash (and thus sign) a single output. Note that hashOutputs is indistinguishable in the normal with one output, and SINGLE with many outputs cases - that would be a vulnerability if not for the fact that the flags are themselves hashed later. We’ve also fixed the SIGHASH_SINGLE bug by making it revert to NONE behavior - arguably less user-friendly than an actual error, but a competent signer can easily detect that case anyway.

Something we have changed is that SINGLE no longer commits to the specific index of the output it’s signing; the old signature hashing scheme did that in a weird, implicit, way because it acted as though you were signing a transaction with all prior outputs set to empty scripts and zero-value outputs, and all subsequent outputs removed. If ANYONECANPAY is not used in addition to SINGLE, the old behavior was redundant: the index of the output is committed by the inputs anyway.

Meanwhile, SINGLE combined with ANYONECANPAY is a niche use-case. If the output value is greater than the input, you’re basically saying “if anyone wants this input to be spent, that’s ok with me, and you need to pay me for the privilege”. Why would someone want to do that? Protocols like colored coins come to mind, where the act of spending a txout can have a value beyond the value of the Bitcoin’s themselves. I believe that the new behavior is a (minor) improvement over the status quo for that use-case, as it allows multiple sales of a colored to be combined into a single transaction - useful for a decentralized exchange.

If the output value is less than the input, you’re basically saying “I want this output to exist, and don’t care if the left over bitcoins go to miners or another output”. It’s hard to come up with use-cases for this - signature combining schemes come to mind modulo the need for a change output - but if you did need that functionality for some reason, I don’t see why it’d ever be a bad thing to be able to combine multiple such inputs into a single transaction.

Next we combine the hashes for inputs, outputs, and sequence numbers, with everything else that’s signed. With regard to scaling, note how everything that follows has a constant maximum size regardless of how large the transaction is, and everything we hashed above was also either constant size, or the same for every input. The only variable is the size of the per-input script, which is accounted for with the sigop limit:

        CHashWriter ss(SER_GETHASH, 0);
        // Version
        ss << txTo.nVersion;
        // Input prevouts/nSequence (none/all, depending on flags)
        ss << hashPrevouts;
        ss << hashSequence;

Red flag: we haven’t explicitly ensured that signatures for the new signature hash can’t be reused for the old signature hash. While unlikely to be an issue here, a slightly better design would have been to insert a dummy 0x00 byte just after nVersion, to absolutely guarantee that new-style signatures would be signing data that in the context of the old signature algorithm would be an invalid transaction with no inputs. Secondly, for further upgrades, add an additional n-bit random tag, which you could change the next time the signature algorithm is changed.

Equally, note how the witness program version is only indirectly signed, via the prevout:

        // The input being signed (replacing the scriptSig with scriptCode + amount)
        // The prevout may already be contained in hashPrevout, and the nSequence
        // may already be contain in hashSequence.
        ss << txTo.vin[nIn].prevout;
        ss << static_cast<const CScriptBase&>(scriptCode);

scriptCode is the subset script being validated, starting from the most recently executed OP_CODESEPARATOR; we’re repeating an archaic design element of the old signature hashing algorithm, that was made irrelevant years ago. Prior to that separation, OP_CODESEPARATOR could have been (sort-of) used for delegation by providing additional (signed) code in the scriptSig, but that was made impossible when scriptSig and scriptPubKey execution were separated. Now it’s redundant, as the prevout commits to the scriptPubKey anyway. Leading this off might have been better, as we’ll see next:

        ss << amount;

Hashing only the amount spent by this input isn’t ideal: we don’t know the total amount spent by the transaction unless we can verify all signatures, which requires a significant amount of data, including all other scripts due to how the above commits to scriptCode. And of course, that’s only true if every input uses segwit.

A better design would have been to treat the amounts in the same way as other input data, like the prevouts/sequence numbers. Fortunately in most cases this isn’t an issue as all inputs are being spent by the same entity, or equally, for the inputs you have spent you know what your total contribution to the transaction fees is. However it still makes it very difficult to, for instance, have a hardware wallet’s UI guarantee the total fee of a transaction partly funded with inputs signed by a lesser security “hot wallet”.

I personally argued a few months ago that segwit should hash transactions with merkle trees; had we done that the above would have been a simple matter of including amounts in the tree. But alas, I raised my objections too late into the process, so I decided to drop the objection for now.

        ss << txTo.vin[nIn].nSequence;
        // Outputs (none/one/all, depending on flags)
        ss << hashOutputs;
        // Locktime
        ss << txTo.nLockTime;
        // Sighash type
        ss << nHashType;

        return ss.GetHash();
    }

No issues here, and pretty much a direct copy of the old signature algorithm.

It’s worth noting here that while the above fixes the O(n²) scaling problem for segwit transactions, it doesn’t do anything to fix the problem with existing transactions; post-segwit transactions can still be created that take excessively long amounts of time to validate. We’ll still need to eventually fix that problem at a later date, although with segwit we can do so more aggressively - e.g. by limiting maximum transaction size/hashed data - as users who do need large transactions have the option of using segwit instead.

Resource Limits

This commit implements the new consensus-critical limits on maximum serialized size of blocks, and the new ways sigops are counted. In summary, in addition to prior limits we have a new total blocksize limit - including witness data - of 4MB, with non-witness data costing 4x more than witness data. Equally, the sigops limit is quadrupled, but non-witness sigops usage now cost 4x more than before. This quadrupling is defined by the new constant, WITNESS_SCALE_FACTOR.

There’s been a push to think about resource limits in terms of generic “costs”, combining all relevant costs as a linear function, rather than specific things like size. Along those lines segwit now refers to a block’s “cost” in many places, such as the constant MAX_BLOCK_COST and the function GetBlockCost(). However, what’s actually in the protocol is that “cost” is purely a function of serialized size in bytes, with a separate sigops limit.

Let’s look at how these new resource limits are implemented in detail:

Blocksize

MAX_BLOCK_SIZE is renamed to MAX_BLOCK_BASE_SIZE, with all consensus-critical places where it was previously used remaining otherwise unchanged. Notably, this includes the previous block size limit check, done in a way that checks the limit against non-witness data only, ensuring that our new blocksize limit is a soft-fork.

Witness block “cost” is then computed by the new function, GetBlockCost():

int64_t GetBlockCost(const CBlock& block)
{
    // This implements the cost = (stripped_size * 4) + witness_size formula,
    // using only serialization with and without witness data. As witness_size
    // is equal to total_size - stripped_size, this formula is identical to:
    // cost = (stripped_size * 3) + total_size.
    return ::GetSerializeSize(block, SER_NETWORK, PROTOCOL_VERSION | SERIALIZE_TRANSACTION_NO_WITNESS) * (WITNESS_SCALE_FACTOR - 1) + ::GetSerializeSize(block, SER_NETWORK, PROTOCOL_VERSION);
}

The (WITNESS_SCALE_FACTOR - 1) term has confused a lot of people: what’s going on there is the second term, ::GetSerializeSize(block, SER_NETWORK, PROTOCOL_VERSION), inherently also includes all non-witness data as well, so the - 1 is there to avoid counting that data twice. Multiplication rather than division is used, because exactly how division results are rounded would be consensus critical; it’s far simpler to completely avoid that question and use multiplication instead.

GetBlockCost() is used exactly once, within ContextualCheckBlock() to limit maximum block “cost”:

    // After the coinbase witness nonce and commitment are verified,
    // we can check if the block cost passes (before we've checked the
    // coinbase witness, it would be possible for the cost to be too
    // large by filling up the coinbase witness, which doesn't change
    // the block hash, so we couldn't mark the block as permanently
    // failed).
    if (GetBlockCost(block) > MAX_BLOCK_COST) {
        return state.DoS(100, error("ContextualCheckBlock(): cost limit failed"), REJECT_INVALID, "bad-blk-cost");
    }

Note how this check is applied even if segwit isn’t enabled; it’s critical that GetBlockCost(block) > MAX_BLOCK_COST be false for all blocks that would pass under the existing rules, or we may end up accidentally forking prior to segwit being enabled. I believe that is true for the following reasons:

The witness mode serialized representation of a block containing no witness data is identical to the serialized representation with SERIALIZE_TRANSACTION_NO_WITNESS set because transactions without witnesses don’t use the extended format.
Earlier in ContextualCheckBlock() we reject blocks that contain witness data when segwit is not enabled.

However it would be safer if the check was only applied if segwit was enabled. In the future we may run into this same issue if we extend the notion of block cost with other types of costs; GetBlockCost() might not be a worthwhile abstraction.

Sigops

When P2SH was added to Bitcoin, it was soft-forked in by adding sigops in P2SH redemption scripts to the existing legacy sigop limit; the additional sigops can only make a previous valid block invalid. P2SH implemented this with a new GetP2SHSigOpCount() function; the segwit equivalent is CountWitnessSigOps(), which in turn calls WitnessSigOps().

Let’s start with the latter, the generic definition of sigops for a witness script, regardless of how it’s committed:

size_t static WitnessSigOps(int witversion, const std::vector<unsigned char>& witprogram, const CScriptWitness& witness, int flags)
{
    if (witversion == 0) {
        if (witprogram.size() == 20)
            return 1;

Remember that 20 byte witness programs are the weird pay-to-witness-pubkey-hash special case, which expands to a standard pay-to-pubkey-hash script. That’s a single CHECKSIG operation, or one sigop.

Otherwise we count the sigops in the witness script, which is the first item in the witness stack:

        if (witprogram.size() == 32 && witness.stack.size() > 0) {
            CScript subscript(witness.stack.back().begin(), witness.stack.back().end());
            return subscript.GetSigOpCount(true);
        }
    }

If witversion is unknown, we return zero; future upgrades would return >= zero, which is a soft-fork:

    // Future flags may be implemented here.
    return 0;
}

But we also return zero if the witness stack is invalid, and the program corresponds to a pay-to-witness-script-hash. Equally in the P2WPKH special case, we return one. Both cases are kinda odd, as we’re returning a value for a witness stack that we know is invalid - sure, we should fail later when the witness is actually evaluated, but it’s a weird edge-case all the same.

Next we have CountWitnessSigOps(), which calls the above after extracting the witness program from the legacy scriptPubKey or P2SH redemption script:

size_t CountWitnessSigOps(const CScript& scriptSig, const CScript& scriptPubKey, const CScriptWitness* witness, unsigned int flags)
{
    static const CScriptWitness witnessEmpty;

    if ((flags & SCRIPT_VERIFY_WITNESS) == 0) {
        return 0;
    }
    assert((flags & SCRIPT_VERIFY_P2SH) != 0);

Unsurprisingly, if segwit isn’t enabled, witness sigops aren’t a thing. Secondly, segwit depends on P2SH.

    int witnessversion;
    std::vector<unsigned char> witnessprogram;
    if (scriptPubKey.IsWitnessProgram(witnessversion, witnessprogram)) {
        return WitnessSigOps(witnessversion, witnessprogram, witness ? *witness : witnessEmpty, flags);
    }

Note how we’ll even provide WitnessSigOps() with a empty witness if none was provided, yet we can only reach this line if segwit is enabled - again we’re returning a result for a case that we know is invalid.

We do this for witness-in-P2SH too:

    if (scriptPubKey.IsPayToScriptHash() && scriptSig.IsPushOnly()) {
        CScript::const_iterator pc = scriptSig.begin();
        vector<unsigned char> data;
        while (pc < scriptSig.end()) {
            opcodetype opcode;
            scriptSig.GetOp(pc, opcode, data);
        }
        CScript subscript(data.begin(), data.end());
        if (subscript.IsWitnessProgram(witnessversion, witnessprogram)) {
            return WitnessSigOps(witnessversion, witnessprogram, witness ? *witness : witnessEmpty, flags);
        }
    }

    return 0;
}

With witness in P2SH, while we use the standard IsPayToScriptHash() and IsPushOnly() methods used elsewhere to detect P2SH, we end up writing our own code to extract the redemption script. The way we’ve done this is totally different from how P2SH evaluation works elsewhere: in EvalScript() we actually execute the script, and use whatever is on the stack, while here we use the last push. Since P2SH requires scriptSig to be push-only, the two should be identical… but in fact they aren’t! IsPushOnly() considers OP_1 to OP_16 and OP_1NEGATE to be push-only, and even more bizarrely, OP_RESERVED too. Fortunately this isn’t relevant to us, because a witness program has to have two pushes, but it’s close call.

Sigop enforcement is actually done in ConnectBlock(). Segwit refactored this code, moving part of the calculation into GetTransactionSigOpCost(), so let’s start by looking at the relevant parts of the diff between the last pre-segwit commit and the merge commit to make sure we know what should be in that function:

@@ -2416,7 +2466,7 @@ bool ConnectBlock(const CBlock& block, CValidationState& state, CBlockIndex* pin
     std::vector<int> prevheights;
     CAmount nFees = 0;
     int nInputs = 0;
-    unsigned int nSigOps = 0;
+    int64_t nSigOpsCost = 0;
     CDiskTxPos pos(pindex->GetBlockPos(), GetSizeOfCompactSize(block.vtx.size()));
     std::vector<std::pair<uint256, CDiskTxPos> > vPos;
     vPos.reserve(block.vtx.size());
@@ -2426,10 +2476,6 @@ bool ConnectBlock(const CBlock& block, CValidationState& state, CBlockIndex* pin
         const CTransaction &tx = block.vtx[i];

         nInputs += tx.vin.size();
-        nSigOps += GetLegacySigOpCount(tx);
-        if (nSigOps > MAX_BLOCK_SIGOPS)
-            return state.DoS(100, error("ConnectBlock(): too many sigops"),
-                             REJECT_INVALID, "bad-blk-sigops");

Regardless of whether or not the transaction is a coinbase transaction, add the legacy sigops count to the sum⁹. While we fail early at this point, ConnectBlock() only (should!) modify any state well after the sigops code changes, so it’s ok to fail later (so long as we do in fact fail!).

Next sum P2SH sigops, in a branch that wasn’t applied to coinbase transactions. Again, note how we’ve changed where the function can fail:

         if (!tx.IsCoinBase())
         {
@@ -2460,18 +2506,19 @@ bool ConnectBlock(const CBlock& block, CValidationState& state, CBlockIndex* pin
                 return state.DoS(100, error("%s: contains a non-BIP68-final transaction", __func__),
                                  REJECT_INVALID, "bad-txns-nonfinal");
             }
+        }

-            if (fStrictPayToScriptHash)
-            {
-                // Add in sigops done by pay-to-script-hash inputs;
-                // this is to prevent a "rogue miner" from creating
-                // an incredibly-expensive-to-validate block.
-                nSigOps += GetP2SHSigOpCount(tx, view);
-                if (nSigOps > MAX_BLOCK_SIGOPS)
-                    return state.DoS(100, error("ConnectBlock(): too many sigops"),
-                                     REJECT_INVALID, "bad-blk-sigops");
-            }

Sum up sigops with the new GetTransactionSigOpCost(), potentially failing if the sigops limit is reached:

+        // GetTransactionSigOpCost counts 3 types of sigops:
+        // * legacy (always)
+        // * p2sh (when P2SH enabled in flags and excludes coinbase)
+        // * witness (when witness enabled in flags and excludes coinbase)
+        nSigOpsCost += GetTransactionSigOpCost(tx, view, flags);
+        if (nSigOpsCost > MAX_BLOCK_SIGOPS_COST)
+            return state.DoS(100, error("ConnectBlock(): too many sigops"),
+                             REJECT_INVALID, "bad-blk-sigops");

+        if (!tx.IsCoinBase())
+        {
             nFees += view.GetValueIn(tx)-tx.GetValueOut();

             std::vector<CScriptCheck> vChecks;

Now let’s examine GetTransactionSigOpCost() to make sure the refactored code is equivalent:

int64_t GetTransactionSigOpCost(const CTransaction& tx, const CCoinsViewCache& inputs, int flags)
{
    int64_t nSigOps = GetLegacySigOpCount(tx) * WITNESS_SCALE_FACTOR;

    if (tx.IsCoinBase())
        return nSigOps;

Here we succesfully handle the legacy sigops count, both non-coinbase and coinbase cases; the latter can return early as a coinbase transaction has no inputs, and thus can’t have P2SH or witness sigops. The sigops count is multiplied by WITNESS_SCALE_FACTOR, which is ok because MAX_BLOCK_SIGOPS_COST is exactly equal to old sigops count times that scale factor.

    if (flags & SCRIPT_VERIFY_P2SH) {
        nSigOps += GetP2SHSigOpCount(tx, inputs) * WITNESS_SCALE_FACTOR;
    }

P2SH count, done on the whole transaction in one go. In the old ConnectBlock() this code was called if fStrictPayToScriptHash was true; SCRIPT_VERIFY_P2SH is set if fStrictPayToScriptHash is true, so the above is equivalent.

Finally we sum up witness sigops, this time on a per-input basis:

    for (unsigned int i = 0; i < tx.vin.size(); i++)
    {
        const CTxOut &prevout = inputs.GetOutputFor(tx.vin[i]);
        nSigOps += CountWitnessSigOps(tx.vin[i].scriptSig, prevout.scriptPubKey, i < tx.wit.vtxinwit.size() ? &tx.wit.vtxinwit[i].scriptWitness : NULL, flags);
    }
    return nSigOps;
}

While CountWitnessSigOps() is called whether or not there is a witness for that input, doing so makes sense in a perverse way: as we mentioned earlier, VerifyScript() attempts to verify witness programs whether or not there’s actually a witness for the input.

Peer-to-Peer Networking

BIP141 specifies the changes made to the P2P layer for segwit. The sending side is very simple: segwit defines a new serialization for transactions with witnesses attached, we advertise a new service bit NODE_WITNESS if we support segwit, and we respond to the new inv types MSG_WITNESS_TX and MSG_WITNESS_BLOCK with witness serialized transations and blocks respectively. I won’t go into the details of exactly how it’s done - it’s a dozen or so small, mechanical, changes - but you can see for yourself here.

The receiving side however is more subtle. It’s critical that we succesfully find and connect to segwit enabled peers; if we fail to do this we simply won’t be able to get the witness data needed to fully validate. To that end if the segwit soft-fork is defined we add NODE_WITNESS to nRelevantServices, which is used in the connection code to determine what peers we try to connect to first; we won’t try connecting to nodes that aren’t advertising nRelevantServices until after 40 failed attempts.

Secondly, we have a new function, GetFetchFlags(), which is used to determine whether or not to request witness-serialized data when responding to an inv message, based on whether or not segwit is active, and whether or not our peer is advertising NODE_WITNESS. Here’s the relevant diff from ProcessMessage() where we make use of it:

@@ -4790,6 +5054,12 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,

         pfrom->fClient = !(pfrom->nServices & NODE_NETWORK);

+        if((pfrom->nServices & NODE_WITNESS))
+        {
+            LOCK(cs_main);
+            State(pfrom->GetId())->fHaveWitness = true;
+        }
+
         // Potentially mark this peer as a preferred download peer.
         {
         LOCK(cs_main);
@@ -4991,17 +5261,23 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,

         LOCK(cs_main);

+        uint32_t nFetchFlags = GetFetchFlags(pfrom, chainActive.Tip(), chainparams.GetConsensus());
+
         std::vector<CInv> vToFetch;

         for (unsigned int nInv = 0; nInv < vInv.size(); nInv++)
         {
-            const CInv &inv = vInv[nInv];
+            CInv &inv = vInv[nInv];

             boost::this_thread::interruption_point();

             bool fAlreadyHave = AlreadyHave(inv);
             LogPrint("net", "got inv: %s  %s peer=%d\n", inv.ToString(), fAlreadyHave ? "have" : "new", pfrom->id);

+            if (inv.type == MSG_TX) {
+                inv.type |= nFetchFlags;
+            }

So with transactions, if segwit is active, and our peer is advertising NODE_WITNESS, we’re asking for witness transactions. On the sending side if a peer asking for witness transaction prior to segwit being activated is harmless, so possibly the code could be simplified slightly (but there may be a reason I’m missing here).

Next we handle blocks:

             if (inv.type == MSG_BLOCK) {
                 UpdateBlockAvailability(pfrom->GetId(), inv.hash);
                 if (!fAlreadyHave && !fImporting && !fReindex && !mapBlocksInFlight.count(inv.hash)) {
@@ -5016,8 +5292,10 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv,
                     pfrom->PushMessage(NetMsgType::GETHEADERS, chainActive.GetLocator(pindexBestHeader), inv.hash);
                     CNodeState *nodestate = State(pfrom->GetId());
                     if (CanDirectFetch(chainparams.GetConsensus()) &&
-                        nodestate->nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER) {
-                        if (nodestate->fProvidesHeaderAndIDs)
+                        nodestate->nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER &&
+                        (!IsWitnessEnabled(chainActive.Tip(), chainparams.GetConsensus()) || State(pfrom->GetId())->fHaveWitness)) {
+                        inv.type |= nFetchFlags;
+                        if (nodestate->fProvidesHeaderAndIDs && !(nLocalServices & NODE_WITNESS))
                             vToFetch.push_back(CInv(MSG_CMPCT_BLOCK, inv.hash));
                         else
                             vToFetch.push_back(inv);

Basically, what’s changed here is that after segwit activates, we only fetch headers from non-segwit peers, not blocks. Secondly, you’ll node how the compact block support is disabled by segwit; that’s from this commit, and I’m sure we’ll see that fixed soon.

Remember though that a peer can still give us a non-witness block, hopefully because it’s a block that just doesn’t have any witnesses; possibly maliciously by taking a witness block and stripping off the witnesses. We already covered what happens there when we discussed how the witness commitment hash was validated, in particular the corruption possible validation state; totally missing witness data is just a special case of the corrupted witness data, and result in a commitment hash mismatch (except of course if the block genuinely doesn’t have any witness data!).

However, while the mempool isn’t consensus critical, I did notice that it looks like we don’t always handle missing/corrupted witness data correctly in transactions. If I’m correct, the problem here is ironically the fact that txids aren’t malleable: the P2P protocol and implementation is all based on txids, which don’t change if the witness data is changed. This is a problem, because in many cases if we request a transaction from a peer, and we don’t accept it for some reason, we won’t re-request that txid again. Not a problem if the mapping of transaction data to txid is one-to-one, but segwit makes that a many-to-one relationship…

Some of the time we handle this correctly, like when AcceptToMemoryPoolWorker() checks input signatures, correctly setting the corruption possible state:

        // Check against previous transactions
        // This is done last to help prevent CPU exhaustion denial-of-service attacks.
        if (!CheckInputs(tx, state, view, true, scriptVerifyFlags, true)) {
            // SCRIPT_VERIFY_CLEANSTACK requires SCRIPT_VERIFY_WITNESS, so we
            // need to turn both off, and compare against just turning off CLEANSTACK
            // to see if the failure is specifically due to witness validation.
            if (CheckInputs(tx, state, view, true, scriptVerifyFlags & ~(SCRIPT_VERIFY_WITNESS | SCRIPT_VERIFY_CLEANSTACK), true) &&
                !CheckInputs(tx, state, view, true, scriptVerifyFlags & ~SCRIPT_VERIFY_CLEANSTACK, true)) {
                // Only the witness is wrong, so the transaction itself may be fine.
                state.SetCorruptionPossible();
            }
            return false;
        }

But we get it wrong in a number of other places elsewhere:

    // Rather not work on nonstandard transactions (unless -testnet/-regtest)
    string reason;
    if (fRequireStandard && !IsStandardTx(tx, reason))
        return state.DoS(0, false, REJECT_NONSTANDARD, reason);

Remember that a transaction can be non-standard because it’s too big, which an attacker can do by adding junk to the transaction’s witnesses. The consequence is the transaction gets added to recentRejects:

        if (!AlreadyHave(inv) && AcceptToMemoryPool(mempool, state, tx, true, &fMissingInputs)) {

            ...

        } else {
            if (!state.CorruptionPossible()) {
                assert(recentRejects);
                recentRejects->insert(tx.GetHash());
            }

This isn’t a major problem - recentRejects is cleared every time the chain tip changes - but an attacker could still screw with transaction propagation temporarily. I’m not sure what the right solution is though, as just changing everything to set corruption possible on error isn’t ideal; possibly the P2P protocol needs be changed to be based on wtxids instead. Edit: Confirmed by Suhas Daftuar, who is working on a fix.

Summary

Segwit has a number of non-ideal warts at the consensus protocol level, mostly stuff another few months of development time and review could have ironed out (or resulted in a lot of arguments about bikesheds).
There’s a few “close calls” in the consensus-critical implementation that could lead to bugs later, but don’t appear to be actual issues right now. Put another way, in a number of places we either have a belt, or suspenders, when given the importance of this code we’d rather have both a belt and suspenders.
Having said that, there doesn’t seem to be anything seriously wrong with the current consensus implementation or protocol design; at worst we’ve added a bit of technical debt, in exchange for getting critically needed functionality out the door.
There ~~does appear to be~~ is a nuisance problem with the non-consensus transaction propagation part of the P2P protocol, though fixing it may require a rethink of how transaction invs work. Edit: Confirmed by Suhas Daftuar, who is working on a fix.
We haven’t actually fixed the O(n²) signature hashing problem yet, although we’re fairly confident that we can, and there’s a open pull-req implementing the cache that we need. edit: btcd has also implemented the signature cache as part of their segwit implementation.

Footnotes

From the point of view of transaction hashing, the so-called flag field is more of a version number that can create new definitions of what a wtxid commits too. ↩
In a clean-sheet redesign of the Bitcoin protocol you’d want to use tagged hashing, where the hash functions are “tweaked” with a unique tag that’s different for every type of data structure that might be hashed. ↩
Incidentally, I think it’s generally a good idea to do this: you really don’t want to accidentally leave something important out of a hash calculation, as anything not hashed is uncommitted data that an attacker can forge at will. ↩
Interestingly, you could imagine a design where blocks didn’t commit to the witnesses: witnesses would still required to prove that a block was in fact valid, but exactly which witnesses were needed wouldn’t be set in stone. But there’s no real advantage to such a design, and it’d add a whole new level of complexity to debugging when things went wrong as it’d be possible that different parts of the network were looking at different witnesses, potentially triggering different bugs. ↩
For the record, the double-hashing is one of a number of flaws I raised a few months ago with segwit, which others’ thought was too late to be worth fixing. ↩
ContextualCheckBlock() is also called by the mining code via the TestBlockValidity() function, but that’s not relevant to consensus. ↩
The duplicate txid issue with how the merkle tree is calculated that we discussed earlier is another example of this problem. ↩
I was sure I remembered seeing a patch from Pieter Wuille fixing this problem a few months ago; not sure what happened to it. ↩
Of course, a coinbase transaction doesn’t spend any inputs, and thus doesn’t evaluate any scripts… So why would a coinbase transaction have sigops? Unfortunately, Satoshi’s original sigops design is broken and completely backwards: rather than counting the sigops that are actually evaluated in the inputs, it’s the unevaluated outputs that count towards the sigops limit! ↩

Contents