CIP Proposal: P2SH data encoding

+1 on this. Great job. I have some minor nits but will save those for once it’s up on github.

Looks great. You can label this CIP 6 and start a pull request with status Draft when you are ready.

Am I correct in reading Peter Todd’s python-bitcoinlib repo that the max data chunk size (MAX_SCRIPT_ELEMENT_SIZE) in a single redeemScript is 520 bytes? That’s quite the increase over OP_CHECKMULTISIG encoding.

but let’s keep the discussion here unless it’s nits about the text :wink:

the consensus max on 1 script is 10k, but the isstandard check is a bit more restrictive:

    // Biggest 'standard' txin is a 15-of-15 P2SH multisig with compressed
    // keys. (remember the 520 byte limit on redeemScript size) That works
    // out to a (15*(33+1))+3=513 byte redeemScript, 513+1+15*(73+1)+3=1627
    // bytes of scriptSig, which we round off to 1650 bytes for some minor
    // future-proofing. That's also enough to spend a 20-of-20
    // CHECKMULTISIG scriptPubKey, though such a scriptPubKey is not
    // considered standard)

“remember the 520 byte limit on redeemScript size” means the max size of 1 element in a script.

peter’s PoC actually splits the data into chunks in the same input like this: OP_HASH160 hash16(datachunk1) OP_EQUAL OP_HASH160 hash16(datachunk2) OP_EQUAL OP_HASH160 hash16(datachunk3) OP_EQUAL.

but I think we should just do 1 chunk per output/input for simplicity and sanity.

also the max isstandard for 1 tx is 100kb

there’s 2 things that - at the very least - need to be discussed.

1. arc4
I’d like to make it so that the first transaction of the pair can be regarded as a plain btc-only transaction and doesn’t have to be parsed at all.
that means only the second transaction really get’s parsed.

we normally arc4 the data with the the txId of the first input on the transaction,
so that would be the first input of the second transaction (because I don’t want to parse the first transaction).
however since that actually is the txId of the first transaction … which we can never know before encrypting the data …

so if we stick to my plan to ignore the first transaction that means we have 3 options:

  1. take something else that is present in the second transaction and already known when the first transaction is created. I think the only option on that front would be the source.
  2. simply arc4 with a fixed string ("COUNTERPARTY" or something)
  3. stop doing arc4 entirely (only for p2sh encoding for now)

Afaik the purpose of arc4 is to obfuscate that it’s a Counterparty transaction, to the point where it at the very least requires doing arc4 decrypting to check against it instead of a simple pattern match.
arc4 encryption is a lot like a simple XOR:
arc4(b'434e5452505254590000000000000000000039380000000002faf080', b'COUNTERPARTY') == b'434e5452505254590000000000000000000039380000000002faf080'
arc4(b'434e5452505254590000000000000000000039390000000002faf080', b'COUNTERPARTY') == b'e937d2bb75cde1cc3d86191ae853aa117f60436fda7f4184f89b428e'

if used with the same seed then any bytes that are the same between 2 input strings are actually also the same!
and since we start all our data with a CNTRPRTY prefix you could easily just filter on that, so that’s why the txId is used as seed to force someone who wants to blacklist Counterparty transactions to expend CPU on decrypting.
so option 2 isn’t really a useful option.

I think the only available data for option 1 would be the {{ pubkey }} used in the data P2SH script, but that would also have a high likelyhood of often being the same (and again resulting in very similar and easier to filter data).

writing all this down, maybe we should just fetch the prev TX and use the txId from the first input of it…

2. P2SH source
the P2SH data outputs still contain a {{ pubkey }} OP_CHECKSIGVERIFY to secure the data output from being spend by others, when the source is a P2SH address we have 2 options for this part:

option 1: specify 1 pubkey to be used for this (eg; in a multisig the person who constructs the initial TX chooses one (his most likely))
option 2: allow the ‘user’ to specify this part of the redeemScript, he needs to make sure that part leaves the stack empty though (so using OP_CHECKSIGVERIFY not OP_CHECKSIG).

for now I’ll leave the implementation with only option 1; it’s only securing DUST in value, the complexity of adding option 2 is quite high, it can be added at a later stage if neccesary.

In looking over this CIP, it made me think about IPFS again. I don’t want to hijack this thread, so I started another one here:

Merged this in draft status.

I wish there was a way to do this in 1 transaction, but I can’t think of a way to do it and I know you (Ruben) put a lot of thought into it as well.

Not sure this method is necessary anymore… https://github.com/bitcoin/bitcoin/pull/8438

hmm a normal counterparty send with multisig encoding will need to pay about +300% extra fee following that PR to get mined.

I have to go over the calculations again to make sure I didn’t make any mistakes, but here’s an atempt at showing the cost increase for different size: https://docs.google.com/spreadsheets/d/1WJARJBzIiP_46-KEKskCPEpExz9_l_2LBE8ErbemdqE/edit?usp=sharing

it seems around 500 bytes of data the overhead of needing 2 TXs for the P2SH becomes more efficient than the cost of paying for 20 sigops worth of size with multisig encoding.

We should also consider the “cost” of needing two txs vs one tx. This was a sticking point when I proposed a simple method for integrating subassets via asset descriptions without any changes to consensus related code. In that case, the general feeling was a change in consensus code was preferred to using two txs to accomplish a subasset issuance.

You’re right, but it’s not that complicated looking at the code required in counterparty-lib.
And ff we want to eventually unleash the EVM on mainnet I think this is a neccesity.

Though considering we can continue using multisig for now (with some proper fee estimation code added) we should probably delay P2SH encoding until EVM and until segwit has activated.
Because using segwit at least it won’t require 2 API calls, it will just be 1 API call that returns 2 TXs to sign and broadcast.


Also keep in mind the Bitcoin Core devs really dislike bare multisig!
Eventhough there’s absolutely no negative effects for bitcoin of letting it live on, I wouldn’t be surprised if they at some point turn it non standard under the “we should encourage best-practice P2SH” banner.

And most likely there won’t be many people opposed because everyone will think that it’s a good move against data embedding, eventhough the above CIP clearly proves there’s alternatives and that killing bare multisig won’t do anything in their fight against data embedding.

I will update the CIP and implementation later to restrict it to segwit only

Does counterparty-lib force the second transaction to spend the output of the first transaction so they are confirmed in order? If so, that isn’t too bad. We can just submit them both (in order) and watch for the second one to confirm.

I think that’s the plan. The problem is, prior to segwit, you can’t be sure of the output of the first transaction until it is confirmed.

yea, the counterparty-lib API provides you with an unsigned TX, prior to segwit the txid of it will change when you sign it so it’s impossible to construct both of the transactions at the same time.

with segwit we can construct them both because the txid of a segwit TX won’t change after signing, which means we can construct and return both at the same time and the ‘user’ can sign and broadcast both at the same time.

Can we use multiple OP_RETURNS to encode more that 80 bytes of data?

I believe it is allowed by the bitcoin protocol but it is considered a non-standard transaction. Wouldn’t this be the most efficient way of encoding data?

Perhaps we can lobby bitcoin core to allow multiple OP_RETURNS as standard. If it is more efficient, then why wouldn’t they do it?

because they’re opposed to embedding data in the blockchain, 80 byte opreturn is already a compromise from their perspective since they feel all you need is hashes (and even that is a compromise).

@rubensayshi - Can we lose the extra output on the setup transaction? That would save space.

In other words, change:

Transaction 1 Outputs
1 output to source with enough value to pay the fee for the second transaction.
1 + n P2SH outputs following the above method with DUST value.
1 change output to send any excess BTC back to source (optional)

To this instead:

Transaction 1 Outputs
1 + n P2SH outputs following the above method with DUST + enough to pay fees for the second transaction.
1 change output to send any excess BTC back to source (optional)

I don’t see the reason for the extra output since the pubkey is already required in the redeem script. Am I missing something?

I think the advantage of the source output/input was that the 1st TX does not have to be parsed my CP, all data required is in the 2nd TX and the 1st TX is purely just a setup part for the 2nd TX which makes for much simpler code since once the 2nd TX is being processed it does not need to somehow get any info/data about the 1st TX.

I think this is still true even if you leave out the source output/input.

The pubkey is in the redeem script. This gives us a verified source of the transaction.

Can we save space and drop the output and input?