New Michelson Opcodes

Note that a summary of this article has been published on Metastate's Medium blog.

This document gives a succinct introduction to the Michelson language, and proceeds to describe the semantics of the new types and instructions in the following features:

  • Support for the elliptic curve construction BLS12-381: new types bls12_381_g1, bls12_381_g2, bls12_381_fr, bls12_381_g1_compressed and bls12_381_g2_compressed, and new instructions COMPRESS, DECOMPRESS and PAIRING_CHECK.
  • More cryptographic hash algorithms: new instructions SHA3 and KECCAK.
  • Support for forbidden branches: new type never and new instruction NEVER.
  • More comparable types: types key, signature and chain_id made comparable.
  • Voting power data: new instructions VOTING_POWER and TOTAL_VOTING_POWER.
  • General improvements: new instructions SELF_ADDRESS and LEVEL, and macro UNPAIR promoted to instruction.

Introduction to the Michelson language

Michelson is a stack-based language with high-level data types and primitives, which shares traits with languages from different families like Forth, ML, Scheme and Cat. The language is strongly typed (interpreting a well-typed program never fails, and all programs are type-checked before interpreting them), and garbage-collected (interpreting a well-typed program never results in memory leaking). The semantics of the language is specified formally by a set of big-step rules, each of them describing how to transform a syntax pattern of the language and a stack into a result stack.

(See Michelson: the language of Smart Contracts in Tezos for the full specification of the Michelson language.)

Support for the elliptic curve construction BLS12-381

BLS12-381 is an elliptic curve construction introduced by Sean Bowe in 2017 for an upgrade of the Zcash protocol (see BLS12-381: New zk-SNARK Elliptic Curve Construction). BLS12-381 is a pairing-equipped curve construction that targets the 128-bit security level, and which is useful for implementing signature aggregation schemes and zk-SNARK verifiers. These and other schemes for zero-knowledge proofs of knowledge (zkPoK) have already been discussed in connection to BLS12-381 in Adding Support for the Pairing-Equipped Elliptic Curve BLS12–381 to Tezos.

Leaving technicalities that exceed the scope of this post aside, the BLS12-318 construction considers two groups $G_1$ and $G_2$ on related, but different elliptic curves. The two groups have a binary group operation (addition), an inverse operation (negation), and a scalar multiplication over the same finite field $F_r$. The construction also induces a bilinear pairing operation, which takes one element from each of $G_1$ and $G_2$, and maps this pair to an element of another related finite field $F_{q^{12}}$, thus efficiently embedding operations in the elliptic curve group into a finite field. The groups $G_1$ and $G_2$ are chosen specifically for fast computation of the pairing operation. For a more detailed account on elliptic curves and pairings see Explore Elliptic Curve Pairings.

Checking a BLS12-381 signature amounts to checking that the pairing operation maps two different pairs to the same element of the scalar field $F_{q^{12}}$. This check can done by dividing one of the pairings by the other---in the scalar field, division is defined as "multiplication by the inverse"---, and by checking that the result coincides with the identity element of $F_{q^{12}}$. By bilinearity of the pairing operation, and by the laws of the binomial product, this check can be implemented in terms of negation of group elements and product of scalars. More interestingly, checking the equality of several couples of pairings can be reduced to checking that the product of the pairings on a series of pairs coincides with the identity element. This is particularly useful when implementing signature aggregation schemes, which require to incrementally check several signatures in a time-effective manner.

The new types bls12_381_g1 and bls12_381_g2 represent elements of the $G_1$ and $G_2$ groups respectively, and the new type bls12_381_fr represents elements of the scalar field $F_r$. Elements of bls12_381_g1 and bls12_381_g2 are compatible with the existing instructions ADD (the binary group operation), NEG (the inverse operation) and MUL (the scalar multiplication with signature $G_x\times F_r\to G_x$, where $x$ is either 1 or 2). Elements of bls12_381_g1 and bls12_381_g2 can be compressed into elements of the new types bls12_381_g1_compressed and bls12_381_g2_compressed respectively. The new instructions COMPRESS and DECOMPRESS correspondingly mediate between the non-compressed and compressed elements of each group.

The new instruction PAIRING_CHECK takes a list with pairs of bls12_381_g1 and bls12_381_g2 as elements, and checks that the product of their pairings is equal to the identity element of the field $F_{q^{12}}$, indicating the success of the pairing check with a bool.

More cryptographic hash algorithms

Two new cryptographic instructions have been added, SHA3 and KECCAK, which implement the hash algorithms in the Keccak family that are detailed below.

Keccak is a family of algorithms designed for the NIST Cryptographic Hash Algorithm Competition from 2008-2010, which sought to create a new standardized hash algorithm that was significantly dissimilar in construction from earlier standards MD5, SHA-0, SHA-1 and SHA-2. The selected algorithm was a particular member of the Keccak family, later known as SHA3.

SHA3 now is a relatively ubiquitous cryptographic hash algorithm. SHA3 is not intended to replace SHA-2, and no significant attack on SHA-2 has been demonstrated to date; However, it is advantageous to have several algorithms available in order to be resistant to compromise of a single hash algorithm.

Keccak256 is another member of the Keccak family, remarkable for its pervasiveness in the Ethereum ecosystem. Keccak256 is a slightly earlier version of SHA3, and the algorithms differ only by choice of parameters.

The instructionsSHA3 and KECCAK take a string of bytes and retrieve the SHA3 (respectively the Keccak256) of those bytes.

It is noteworthy that Keccak256 is often erroneously referred to as "SHA3" in the Ethereum ecosystem; a vestigial inaccuracy, which has become unfortunately commonplace. However, sometimes "SHA3" does indeed refer to the SHA3 function proper. Diligence should be exercised as a result of this widespread misconception.

Support for forbidden branches

The concept of contract template has been proposed as the means to implement financial interfaces for smart contracts, which define a set of basic entrypoints that are common to several contracts (see the FA1.2 of this Tezos improvement proposal for an example of such an interface). A contract template collects these common entrypoints by using a parameter of union type, and leaves place for additional entrypoints via a type variable in the parameter's type, and a branch in the code of the contract that processes the part of the parameter that corresponds to this type variable. This concept is illustrated by the following example taken from Tezos issue #662, where p is the type variable for additional entrypoints and doOther is the branch that implements them:

parameter or (int %callA) (or (nat %callB) p)
storage ...
code { UNPAIR; IF_LEFT {doA} { IF_LEFT {doB} {doOther} } }

Sometimes a contract provides no additional entrypoints, but this fact could not be reflected in the contract's type. The new type never serves this purpose: using it in place of the type variable in the parameter's type forbids the use of any entrypoint different from the common ones, thus indicating that the contract interface has not been extended. The accompanying instruction NEVER is used to fill the forbidden branch implementing such non-existing additional entrypoints. The instruction NEVER receives a stack with a never type on its top and returns a stack of arbitrary type. To wit, the code below forbids the additional entrypoints represented by p in the aforementioned example:

parameter or (int %callA) (or (nat %callB) never)
storage ...
code { UNPAIR; IF_LEFT {doA} { IF_LEFT {doB} {NEVER} } }

The type never is equivalent to the Empty type present in many programming languages and proof assistants (a.k.a. Zero, or False type), which is a type with no inhabitants. Since never has no inhabitants, no value of this type is ever permitted to occur in a well-typed program. Notice that the code snippet above does not create any value of type never, but merely abstracts a value of such type via the parameter declaration.

The type never is comparable, with an empty COMPARE instruction.

More comparable types

The types key, signature and chain_id have been made comparable (see Michelson: the language of Smart Contracts in Tezos for a full description of those types). A comparable type has a compatible instruction COMPARE that induces a total order on the values of the type. This is particularly useful since the values of comparable type are the only ones that can take part in collections where those values have to be unique and sorted, as is the case with the elements of a set, and with the keys of a map or of a big_map.

Voting Power Data

Two instructions have been added that allow Michelson contracts to operate with the voting power of bakers. The instruction VOTING_POWER takes the key hash of a delegate and retrieves a natural that is equal to the number of rolls in the delegate's staking balance, as it appears in the delegate's entry in the voting listings. The voting listings, which are calculated at the beginning of each voting period, reflect each delegate's vote weight when emitting a Ballot operation. The instruction TOTAL_VOTING_POWER retrieves a natural that is equal to the total number of rolls of the delegates in the voting listings, such that the relative vote weight of a delegate can be calculated by comparing the delegate's voting power with the total voting power.

Knowing a delegate's relative vote weight programmatically enables to perform early polling for preferences on protocol features, by deploying contracts that use the instructions VOTING_POWER and TOTAL_VOTING_POWER.

General improvements

The new instruction SELF_ADDRESS retrieves the address of the current contract, in a way similar to the sequence of instructions SELF; ADDRESS. However, SELF_ADDRESS is permitted to occur within lambdas, whereas SELF; ADDRESS is not. This ability can be used to implement advanced mechanisms in which the address of the current contract is bound dynamically.

The new instruction LEVEL retrieves the current block level.

Besides adding the instructions SELF_ADDRESS and LEVEL, the macro UNPAIR has been promoted to an instruction with equivalent semantics.

Written by Álvaro García Pérez, core developer and researcher at Metastate.

Metastate has ceased its activities on the 31st of January 2021. The team members have joined a new company and working on a new project. If you're interested in distributed systems research and engineering and systems engineering in Rust, checkout the open positions at Heliax.

If you have any feedback or questions on this article, please do not hesitate to contact us: hello@heliax.dev.