A Gentle Introduction to Dependent Types

Dependent type (or dependently typed programming) languages allow to enhance the expressiveness of your code, thereby increasing the errors the compiler can catch (at compile time). This allows you to correct them and improve the chances that your program behaves as expected at runtime. It makes them particularly well suited to write smart contracts. For feedback or questions, please do not hesitate to contact us: team@metastate.dev

Introduction

Computers and software are part of our daily life. They are everywhere and we rely on them daily, for instance for financial transactions, communication or even home appliances [1]. Bugs and errors seem commonly accepted [1], although they can sometimes have catastrophic consequences. For that reason, several solutions exist in order to detect and reduce them, the most prominent one being type systems [2]. The primary goal of a type system is to prevent the manifestation of execution errors by executing a type checking algorithm in order to catch type errors [3] [4]. In computer science, type systems have been introduced in the 1950’s (in programming languages like Fortran) [2].
Nowadays, not all languages use a type system. As a result, programming languages can be classified into two categories: statically or dynamically typed1 (also known as untyped [4]). Statically typed languages (like Java, C, C++, Fortran, etc.) perform a type checking operation during the compilation process, and stop compilation if the type checker encounters an error [4]. For instance, adding a string to a variable defined as an integer will cause an error. In dynamically typed languages (like Python, JavaScript, etc. [5]), however, this error may only reveal itself when the program is being executed. This can be perceived as an additional difficulty to programmers, as they will not receive any assistance "to understand what the code actually does" [4].

It follows that statically typed languages add a safeguard against potential execution errors compared to dynamically type languages [3]. This extra safeguard will increase compilation time ("as compilation becomes more expensive" [4]) as well as potentially the complexity of the code [4]. Thus, dynamically typed languages can be perceived as more permissible and grant more power to the programmer [4]. It is a double-edged sword, as bugs reveal themselves at runtime and are generally fatal for the execution of the software [4]. In order to circumvent some of them, programmers using dynamically typed languages can employ a linter (somehow comparable to a type system, as a linter will analyse the code to detect programming and stylistic errors [6]).
Type systems undeniably confirm the "absence of some bad program behaviour", however, they cannot "prove their presence" [2]. In other words, they might reject some programs that would work properly at runtime [2]. For instance, consider the following line of code:

if(true)
return 5;
else
return "Hello";

will be rejected as ill-typed2 even though the else clause will never be reached [2]. Type checking could be compared to dimensional analysis used in physics3, where the analysis of base units of an equation provides a "partial correctness check" [7]. For instance, if we express the distance as the multiplication of speed and weight we end up with an incoherent result (distance expressed as m ⋅ kg/s)4. $$distance [m] = speed [m/s] \cdot weight [kg] \rightarrow [m] \neq \left[\frac{m \cdot kg}{s}\right]$$ In contrast, Einstein’s famous formula is correct at least from a dimension analysis point of view: $$E = mc^2 \rightarrow \left[\frac{kg \cdot m^2}{s^2}\right] = [kg] \cdot [m/s]^2$$ An equation can pass dimensional analysis but will still be meaningless. In the same way, type systems can only ensure that a well-typed software is free from certain errors. Nevertheless, they can reveal a wide range of errors [2] to such an extent that once the code of a richly typed languages compiles, it usually "just works" [2]. Thus, it appears that the more richly typed your programming language is, the more errors the compiler will catch [4], increasing the chance that your program behaves as expected. Unfortunately, mainstream statically typed systems only offer a few basic types (e.g. integers, string, boolean, etc.), which limits the expressiveness of the code. Although depending on the use case those basic types offered are more than enough, there are certain use cases where it is insufficient. Therefore, the best solution would be to offer a programming language that allows the programmer to extend the expressiveness of types according to their needs, and capture more advanced properties if needed. This could be done by using dependently typed languages which we will cover in the following section.

Dependent types

Overview

Usually, types and terms sit on a different level, the term used defining which type can be assigned to it [8]. For instance, the value 1 will have to be stored in a variable of type int5. Now, if we put quotes around it, i.e. ’1’, it immediately becomes a type char or string (you cannot assign it a type int anymore). With dependent types, the separation between types and terms is blurred [9], they now sit on the same level. Thus, dependent types allow types to depend on a term of another type [10] [11]. Hence, any part of a program that can be evaluated to a value can also be evaluated to types6 [4]. A typical example is an array of a fixed length. For instance, an array of length three is of type array and its size depends on the value passed (in this case three). By using different values you define distinctive types. This allows in turn to specify a function that can only receive a certain type as input. For instance, if you specify as input of your function an array of size three and provide an array of size four, the compiler will catch this error. Other languages also enable you to create an array of a fixed size, however, with a true dependent type, the size of your array can be an arbitrary term that does not need "to be known at compile time" [9].

Thus, with a dependently typed language the developer has the option to add more expressiveness to their code by extending the expressiveness of types [12] and consequently the compiler can catch more errors [4]. It is up to the programmer to decide to which level of detail they want to go (or even to not use them). The more complex the dependent types used, the more time the programmer will have to spend to properly describe them and the more computational expensive it will be [4], but in return they will receive more support from the compiler [8]. Hence, by empowering the developer to be more explicit in their code, it increases the amount of potential errors that the compiler can catch [13][14][15]. Depending on the problem at hand, it is a relatively modest cost to pay [16]. Moreover, as it is up to the programmer to define the level of complexity of their code, they can choose the appropriate level of expressiveness to maximize the net benefit [17].

Programs are proofs

The development of type-dependent programming languages originated from the Curry-Howard correspondence [11]. Essentially, it outlines a correspondence between mathematics (intuitionistic logic) and programming languages (type theory) [18]. It derives its name from the fact that it was first observed by Haskell Curry (in 1934 [19]) and later refined by William Alvin Howard (in 1969 [20])[21]. At the beginning of the 1930s, Curry realized the similarity between axioms of implicational propositional calculus P ⊃ Q ⊃ P   (P ⊃ Q ⊃ R) ⊃ (P ⊃ Q) ⊃ P ⊃ R and types of SKI combinator calculus (in particular combinators S and K) [11] A → B → A   (A → B → C) → (A → B) → A → C Other similarities started to be discovered which resulted in the Curry-Howard correspondence. The Curry-Howard correspondence states on a high level that each proposition of intuitionistic logic can be converted into a type in the programming language and vice versa [21]. It results that:

types are propositions

It goes further, as for each provable proposition there exists a program with a corresponding type and vice versa [21]. Thus:

programs are proofs

Hence, it is possible to prove a mathematical theorem using a computer (which is referred to as computer-assisted proof). One notorious example is the proof of the four color theorem [22]. Thus, by using dependent types to encode propositions as types, the resulting program can be used as a proof. As the compiler will type check the program, it only needs to compile to be proven true [18]. In a way, by type checking your program the compiler will prove that your program verifies certain theorems [22]. Thus, dependently typed programs are in essence "proof carrying code" [17].

Dependently typed programming languages

Unfortunately, dependent types are not well known by programmers [23]. Mainstream statically typed languages do not support them [14]. Moreover, a large part of the documentation on dependent types are written by academics for other academics to read [23]. Consequently, they are often abstruse. Worse, programmers knowing about them are often reluctant to use dependently typed programming languages as it is believed that type checking becomes undecidable (i.e. by allowing arbitrary expression in types, the type checker will loop endlessly) [23][24]. It seems that the principal hurdle slowing down the propagation of dependent types is related to misunderstanding [23]. This coupled with the fact that there are not a lot high-performance compilers available makes them not very well known to the general public.

As it is often the case, the best way to learn something is to start playing with it. Although recent, there are various different dependently type languages, among which the most prominent are: Coq (2004 [25]), Agda (2007 [8]), Idris (2011 [26]), F* (2011 [27]). There are several very good resources to start learning them (for Idris see f.i. [1] or [28]). Haskell is another programming language that can be used to experiment with dependent types. Moreover, some people argue that by learning Haskell it helps you grow as a programmer and in a sense makes you a better programmer [29][30]. Thus, the time you spend learning it (even if you do not use it afterwards) will be valuable. Moreover, beginner Haskell programmers experience a curious phenomenon: Once your code compiles, it usually just works. It is such a recurring event that the Haskell wiki has a page for it: Why Haskell just works.

If there are so many advantages, why are dependently typed languages not more popular? In fact, there are not many projects and companies that use dependently typed languages. This could be due to several reasons:

  • The commercial adoption of those languages could be weakened because they are quite recent, misunderstood and are perceived as hard to learn. Thus, this reduces the experience levels of the programmers and the potential available skill set of workers. This in turn reduces the potential amount of projects that could benefit from it. Moreoever, the resources that can be used by a general audience are quite scarce at the moment.

  • Agda and Coq are proof assistants (i.e. to write and check proofs) and are not used for general purpose programming [31][32]. The development of Idris is led by Edwin Brady (the creator of Idris), currently a lecturer at the university of St Andrews [33]. Nonetheless, the limited support and contributors outside of academia might bring enough uncertainties, such as usability, or language long-term support, which could limit industrial adoption.

  • They might not be a good fit with industries with the motto "move fast and break things". A better fit could be industries where security is paramount (military, aerospace, financial institutions). A secrecy veil could surround those projects reducing the known use cases.

This current status might change as it is a constant evolving field of study (see f.i. the recent work on quantitative type theory that combines dependency with linearity [34][35]). This will in turn broaden the use cases for dependently typed languages. One of those potentially promising fields could be blockchain (specifically to write safer smart contracts). In fact, once a smart contract is deployed on the blockchain, its code is immutable [36]. Hence, it is crucial to ensure the correctness of a smart contract before deploying it on the blockchain [37]. Thus, a longer time to compile a smart contract seems a small price to pay if that can increase its chances to behave as expected. Therefore, using a dependently typed programming language to code smart contracts seems to fit perfectly. See for instance Juvix, a dependently typed programming language specially developed to program safer smart contract.

Conclusion

Dependently typed programming languages reduce the separation between types and terms. They allow types to depend on a term of another type, thereby increasing the expressiveness of a programmer’s code and the computational cost but reducing the potential errors. Moreover, the programmer can adapt their level of expressiveness to maximize the cost-benefit ratio.
Unfortunately, at the moment of writing, the commercial adoption of dependent types seems hindered. This could be due to the fact that they are quite recent, misunderstood and are perceived as hard to learn, momentarily reducing the experience levels and available skill set of programmers and thus the potential amount of projects that could benefit from them.
It is, however, an evolving field of study with new potential applications. One promising field could be to use a dependently typed language to write safer smart contracts. This is the case for instance of Juvix, which is a new dependently typed programming language specially developed to program safer smart contract.

Written by A. Samartino, blockchain researcher at Metastate. For feedback or questions, please do not hesitate to contact us : team@metastate.dev

Image credits: Lemon shark and remoras from Wikimedia