Python Data Model and Protocols
March 14, 2025
TOC
Python Data Model
You can think of the data model as a description of Python as a framework. It formalizes the interfaces of the building blocks of the language itself, such as sequences, iterators, functions, classes, context managers, and so on.
While coding with any framework, you spend a lot of time implementing methods that are called by the framework. The same happens when you leverage the Python data model. The Python interpreter invokes special methods to perform basic object operations, often triggered by special syntax. The special method names are always written with leading and trailing double underscores (i.e., __getitem__
). For example, the syntax my_collection[key]
is supported by the __getitem__
special method. In order to evaluate my_collection[key]
, the interpreter calls my_collection.__getitem__(key)
.
The special method names allow your objects to implement, support, and interact with basic language constructs such as:
- To initilize a class, you have
__init__
- To define string representation and formatting of an object, you have
__repr__
- To support operator overloading like addition of two objects, you have
__add__
- To access an attribute of an object, you have
__getattr__
In Python, for every top-level function or top-level syntax - there is a corresponding dunder (double underscore) function.
Python data model is a means by which you can implement protocols.
Protocols
In Python, protocols are a way to define implicit contracts that classes can implement without requiring explicit inheritance from a base class.
Therefore, for an object to pass as a Callable
, all it needs is to implement the __call__
method (with the relevant args)
from typing import Callable
class Foo:
def __call__(self, x):
return x * 2
f = Foo()
print(isinstance(f, Callable)) # True
print(f(2)) # 4
Similary, for an object to pass as a Sized
, all it needs is to implement the __len__
method. Additionally, these protocols allow us to hook into python’s top-level function and syntax. For example, we can call len(...)
on an object that implements __len__
.
from typing import Sized
class Foo:
def __len__(self):
return 1
f = Foo()
print(isinstance(f, Sized)) # True
print(len(f)) # 1
These protocols also allow us to implement custom behaviour on python object. For example, below we have defined what should happen when two Coordinate
objects are added:
class Coordinate:
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
return Coordinate(self.x + other.x, self.y + other.y)
c1 = Coordinate(1, 2)
c2 = Coordinate(4, 5)
c3 = c1 + c2
print(c3.x, c3.y) # 5 7
Iterators
Let’s see the protocol for Iterators.
items = [1, 2, 3]
for i in items:
print(i, end=" ") # 1 2 3
for i in items
syntax looks like this under the covers
items = iter(items) # `__iter__`
while True:
x = next(items) # `__next__`
The dunder methods that we’ll have to implement for top-level functions iter
and next
are __iter__
and __next__
respectively. Therefore, the protocol for Iterator looks like the following:
from typing import Iterator
def foo():
pass
class Items:
def __init__(self):
self.items = [1, 2, 3]
self.index = 0
def __iter__(self):
self.index = 0
return self
def __next__(self):
if self.index >= len(self.items):
raise StopIteration
item = self.items[self.index]
self.index += 1
return item
items = Items()
print(isinstance(items, Iterator)) # True
for i in items:
print(i, end=" ") # 1 2 3
Therefore any object having an __iter__
and a __next__
can be used as an Iterator. Similary, we have protocols defined for Generators, Collections, etc.