Python’s Walrus Operator and Dict Comprehensions

Submitted by Sean on Fri, 03/03/2023 - 03:12

Python 3.8 added an assignment operator :=, also known as the “walrus operator” (rotate it 90 degrees clockwise and you’ll see why). A few use cases are described in this page on What’s New in Python 3.8, but despite the fact that it’s not so new anymore (it came out more than three years ago), I hadn’t found much use for it until recently. 

Why was the walrus operator created? In Python, ordinary assignment with = is an expression, so it doesn’t return a value.For example, print(x=23) returns an error: thre’s no value to print. print(x:=23) succeeds, the other hand, and prints 23. But of course you can just assign a value to x on another line and then print it, so this isn’t a compelling case.

I do a lot of data wrangling, and when dataclasses were first introduced (Python 3.7) they quickly became my go-to approach for loading structured data. They make it easy to 

  • model external record structures in Python
  • define static methods from converting data from JSON, CSV, or other formats
  • define class-specific methods

and more. Going further, when those record structures have a unique identifier (which is often the case), it can be convenient to load the whole dataset into a reader or manager class that subclasses UserDict, with the identifiers as keys and the records (as dataclasss instances) as values. That way you have all the data, nicely organized, but also accessible by looking up a record’s identifier in the dictionary, 

Dictionary comprehensions are an efficient and natural way to structure this, but there’s a problem: if you’re reading some data to create a dataclasss instance inside the comprehension, how do you get the identifier out of the dataclass at the same time so you can use it for the key?

The walrus operator makes this easy. Add an if clause inside the comprehension that assigns the loaded data to a dataclass using the walrus operator (this crucially assumes your data loader always returns a non-null value). Now you’ve got the instance, so you can use the identifier field as the key and the whole instance as the value in the comprehension. 

Example: suppose you have a CSV with 50 records on each US state, with the state abbreviation in one column and various other statistics in additional columns, with a dataclass like this represent a single row.

class StateData:
    abbreviation: str
    stat1: int
    stat2: int

etc. You can initialize a dict for this data with something like this (assuming you're using `DictReader` or something similar)

datadict = {abbreviation: StateData(**row) for row in rows if (abbreviation := row["abbreviation"])}

I do this kind of thing all the time now that I've figured out the pattern.