parakramo

Python Data Model and Protocols

March 14, 2025

TOC

  1. Python Data Model
  2. Protocols
    1. Iterators

Python Data Model

You can think of the data model as a description of Python as a framework. It formalizes the interfaces of the building blocks of the language itself, such as sequences, iterators, functions, classes, context managers, and so on.

While coding with any framework, you spend a lot of time implementing methods that are called by the framework. The same happens when you leverage the Python data model. The Python interpreter invokes special methods to perform basic object operations, often triggered by special syntax. The special method names are always written with leading and trailing double underscores (i.e., __getitem__). For example, the syntax my_collection[key] is supported by the __getitem__ special method. In order to evaluate my_collection[key], the interpreter calls my_collection.__getitem__(key).

The special method names allow your objects to implement, support, and interact with basic language constructs such as:

  • To initilize a class, you have __init__
  • To define string representation and formatting of an object, you have __repr__
  • To support operator overloading like addition of two objects, you have __add__
  • To access an attribute of an object, you have __getattr__

In Python, for every top-level function or top-level syntax - there is a corresponding dunder (double underscore) function.

Python data model is a means by which you can implement protocols.


Protocols

In Python, protocols are a way to define implicit contracts that classes can implement without requiring explicit inheritance from a base class.

Therefore, for an object to pass as a Callable, all it needs is to implement the __call__ method (with the relevant args)

from typing import Callable
 
class Foo:
    def __call__(self, x):
        return x * 2
 
f = Foo()
print(isinstance(f, Callable))  # True
print(f(2))  # 4

Similary, for an object to pass as a Sized, all it needs is to implement the __len__ method. Additionally, these protocols allow us to hook into python’s top-level function and syntax. For example, we can call len(...) on an object that implements __len__.

from typing import Sized
 
class Foo:
    def __len__(self):
        return 1
 
f = Foo()
print(isinstance(f, Sized))  # True
print(len(f))  # 1

These protocols also allow us to implement custom behaviour on python object. For example, below we have defined what should happen when two Coordinate objects are added:

class Coordinate:
    def __init__(self, x, y):
        self.x = x
        self.y = y
 
    def __add__(self, other):
        return Coordinate(self.x + other.x, self.y + other.y)
 
c1 = Coordinate(1, 2)
c2 = Coordinate(4, 5)
 
c3 = c1 + c2
print(c3.x, c3.y)  # 5 7

Iterators

Let’s see the protocol for Iterators.

items = [1, 2, 3]
for i in items:
    print(i, end=" ")  # 1 2 3

for i in items syntax looks like this under the covers

items = iter(items)  # `__iter__`
while True:
    x = next(items)  # `__next__`

The dunder methods that we’ll have to implement for top-level functions iter and next are __iter__ and __next__ respectively. Therefore, the protocol for Iterator looks like the following:

from typing import Iterator
 
def foo():
    pass
 
 
class Items:
    def __init__(self):
        self.items = [1, 2, 3]
        self.index = 0
 
    def __iter__(self):
        self.index = 0
        return self
 
    def __next__(self):
        if self.index >= len(self.items):
            raise StopIteration
        item = self.items[self.index]
        self.index += 1
        return item
 
items = Items()
 
print(isinstance(items, Iterator))  # True
 
for i in items:
    print(i, end=" ")  # 1 2 3

Therefore any object having an __iter__ and a __next__ can be used as an Iterator. Similary, we have protocols defined for Generators, Collections, etc.