Setting up a statically-typed Python programming environment with mypy and Poetry

30.12.22    Python    tools and tricks

Recent developments in the Python ecosystem facilitate the creation and enforcement of static typing information in Python programs. That is to say, you can now add a type hint to a function's parameter for example. Then a tool can automatically ensure that any argument passed to this function satisfies this type requirement. Let's take a look:

# File main.py
# Parameter 'p' has a type hint of 'str'
# Therefore foo only accepts arguments of type 'str'
def foo(p: str) -> str:
    return "hello {}".format(p)

print(foo("test")) # <-- ok
print(foo(3))      # <-- ERROR cannot pass int to f where str is expected

It is important to note that this is standard Python code and not some third-party language extension. But still, the Python interpreter ignores the contents of the type hints completely and runs the program normally without error. You have to use a separate type checker, like mypy, that reads the type hints and analyzes the code to make sure the rules of proper typing are adhered to. In this short guide I show how mypy can be integrated into the Python programming workflow using the Python dependency management utility poetry.

Note that these type hints and this workflow do not force you to define the types everywhere rigorously. Instead it enables what is known as gradual typing. You leave out type hints where they cannot (yet) express the complex type or where you just not care about the type. The type checker will only mark errors where two known types are in conflict and there is a definite error. This way you can also gradually convert a existing code base (or parts of it) to typed Python.

Of course this mixing dynamic and static typing comes at a cost. You are not checking the correctness in the places where you left out the type hints. Moreover it is then possible for values with the wrong runtime type to be stored in variables with a different static type because the static type of the variable is not enforced at runtime at all. This type errors can then propagate through your data structures until some completely unrelated part of your program crashes. But this is no different from classic, completely untyped Python. Thus, overall the type hints are a big boon.

Poetry project setup

Poetry is a tool for managing Python dependencies (and corresponding virtual environments). It uses custom sections in the new standard pyproject.toml file for Python projects to keep track of the user defined direct dependencies. The dependencies are categorized into the necessary components for running the software (in the section tool.poetry.dependencies) and for developing it (in the section tool.poetry.dev-dependencies). Based on these, Poetry computes all transitive dependencies and their versions and saves them in a file named poetry.lock to ensure a reproducible virtual environment.

After we setup a new Poetry project named blob and copy the Python code from the top into file blob/main.py1, we can run it via poetry run python -m blob.main. This runs the command python -m blob.main inside this virtual environment. We will see no error. Instead it runs perfectly fine and gives the output we would expect from the same program without type hints:

$ poetry run python -m blob.main
hello test
hello 3

Integrating mypy

So in order to actually catch the mistake we have to employ the type checker mypy. We add it as a dev dependency like this:

$ poetry add --dev mypy

It is a dev dependency because, while we depend on mypy during development to ensure correct type usage, the type checker is not necessary in a production environment. There we just run the program ignoring the types like we did earlier. But we now have the ability to only deliver code which satisfies the type constraints by making sure mypy is satisfied. We run mypy like this to find the type related error in our example:

$ poetry run python -m mypy blob/main.py
blob/main.py:8: error: Argument 1 to "foo" has incompatible type "int"; expected "str"  [arg-type]
Found 1 error in 1 file (checked 1 source file)

Once we remove the offending line in main.py we get a nice clean output:

$ poetry run python -m mypy blob/main.py
Success: no issues found in 1 source file

This command should be added to the documentation of the project so all developers know they have to check the types. Ideally it is integrated into the CI/CD pipeline and/or as a git commit hook. This way checking types cannot be forgotten and no code with statically known errors is pushed into production.

Later, if we add more file to our package blob, we can check the all at once. This can be achieved by telling mypy to check the whole package using the -p option. For this we do need an __init__.py file in our blob directory so that mypy can find the package. It can be empty and thus the simplest way to create it is via touch blob/__init__.py. Then wen can check our package like this:

$ poetry run python -m mypy -p blob
Success: no issues found in 2 source files

Conclusion

Type hints are great addition to Python. They are standard way to express structures, constraints and contracts in Python, that have implicitly existed before. But now, with the help of tools like mypy, this constraints, once written down in the code, can be checked and verified. Using Poetry, mypy can be easily integrated into a project. Because mypy is managed by Poetry, it can access the dependencies and use their type definitions to check the project's code for correctness. And because mypy is added as a dev dependency, no extra care has to be taken not to deploy mypy and its dependencies to production.

Extra: the power of the type system

Of course the type system can express simple, elementary types. This includes types like str and int and so on. You can also use the names of classes3 as types so the variable is restricted to instances of this class (or subclasses).

But this is not the end. You can use these simple types to express more complex ones! A list of strings? Easy: list[str]. A callable (e.g. a function) that accepts a two strings and returns a list of strings? That's Callable[[str, str,], list[str]]. A dict that maps str to tuples of two ints and a string? No problem: dict[str, Tuple[int, int, str]]. This is the point where the type system becomes really useful. Those nested structures are very easy to create and manipulate. But normally this ease also makes them easier to mess up. It is just to easy to put string of digits into a list of ints because you forget to convert it. The type system helps you to eliminate this whole class of trivial mistakes and express (and therefore document!) the structure of your data.

"But what about the valid use cases of dynamic typing?" you might ask, dear reader. While I am no proponent of completely dynamic typing, as can be easily guessed by the topic of this post, I concede that there are use cases for not being completely strict. For example, a function to set a colour might accept a tuple of RGB values Tuple[int, int, int] or a named colour str. This is this where the best feature of the typing system comes into play: union types4. Union types allow you to express that a value is either a string or tuple of ints with Union[str, Tuple[int, int, int]]. Or if a value is a string or None, then its type is Union[str, None]. Such a union of a type, str in this example, with None is so common that has a special alias: Optional[str].

Newer Python versions (from Python 3.10, see PEP 604) make this union types even easier to write. Whenever you have a type Union[A, B, C...], you can express it with A | B | C.... An optional string can thus be expressed even more tersely than before with str|None.


  1. The directory for the Python source files should be the package name defined in tool.poetry.name. If the package name contains dashes - they are replaced with underscores _

  2. This file marks a proper Python package. Without this file a directory implicitly is a namespace package. But mypy does not work with this kind of package. 

  3. The class has to be already defined at the time the type hint is evaluated. This can cause errors if you try to reference a class before the class definition. This includes the case of class methods referring to their own class - a rather common case. A current workaround is to use a string literal as type instead of the class name itself. (Partial) solutions to this problem are proposed in various PEPs. 

  4. These are true union types as opposed to discriminated union types like in Haskell and many other languages. The variants of Python's union type are not named and satisfy the laws of symmetry (Union[X, Y] = Union[Y, X]) and idempotency (Union[X, X] = X). It is the true supremum of the type lattice