Bug Bytes Web

Pydantic - Validators

In this post, we dive further into Pydantic validators, including how to use the root_validator decorator, how to define validators that run before others, and how to run validators on each item in a sequence of data.

This is a continuation of the previous post which can be found here.

For this post, we'll continue with the source data from the previous post, here. In the second half of the post, we'll introduce new data with extra fields.

The associated video for this post can be found below:

Objectives

In this post, we will:

Understand the order in which validation of fields occurs in Pydantic
Learn what a root-validator function does, and how to use it
Learn how to apply functions that run before Pydantic's default validation
Learn how to apply a validator function to each item in a list/set/dictionary of data

Validator Recap

In the first post, we saw how to use the @validator decorator to define custom validation on fields in our model. As a reminder of the code from previous posts, the Pydantic model for Student data is shown below (imports, enums and the Module model are omitted):

class Student(BaseModel):
    id: uuid.UUID
    student_name: str = Field(alias="name")
    date_of_birth: date = Field(default_factory=lambda: datetime.today().date())
    GPA: confloat(ge=0, le=4)
    course: Optional[str]
    department: DepartmentEnum
    fees_paid: bool
    modules: list[Module] = Field(default=[])

    class Config:
        use_enum_values = True
        extra = 'ignore'

    @validator('modules')
    def validate_module_length(cls, value):
        if len(value) and len(value) != 3:
            raise ValueError('List of modules should have length 3')
        return value

    @validator('date_of_birth')
    def ensure_16_or_over(cls, value):
        sixteen_years_ago = datetime.now() - timedelta(days=365*16)

        # convert datetime object -> date
        sixteen_years_ago = sixteen_years_ago.date()
        
        # raise error if DOB is more recent than 16 years past.
        if value > sixteen_years_ago:
            raise ValueError("Too young to enrol, sorry!")
        return value

On lines 15-31, we define two validator functions that use the decorator. The first is applied to the modules field, and the second to the date_of_birth field.

Let's expand on these, in this post.

In each of these functions, the value for the field is passed as the second argument, which we call value in our validator definition. Pydantic then executes the function and the logic within will determine whether or not the value is acceptable; if not, it should raise an error.

Validator functions can take on extra arguments, too. The most important of these is the values argument, which is a dictionary containing all previously validated fields in the model.

It turns out that in Pydantic, validation is done in the order that fields are defined on the model. So in the above example, any validators for the id field run first, followed by the student_name field, and so on. We can see this below, if we define a dummy validator for our 4th field, GPA.

class Student(BaseModel):
    @validator('GPA')
    def validate_gpa(cls, value, values):
        print(values)
        return value

We can fetch the raw data from our Github repo, convert to Pydantic models, and see the output of this with the following Python code:

# fetch the raw JSON data from Github and convert each record to a Pydantic model
url = 'https://raw.githubusercontent.com/bugbytes-io/datasets/master/students_v2.json'
data = requests.get(url).json()
for student in data:
    model = Student(**student)

The GPA validation function's print() statement prints out the previously validated fields to the terminal, as below:

{'id': UUID('d15782d9-3d8f-4624-a88b-c8e836569df8'), 'student_name': 'Eric Travis', 'date_of_birth': datetime.date(1995, 5, 25)}
{'id': UUID('4c7b4c43-c863-4855-abc0-3657c078ce23'), 'student_name': 'Mark Smith', 'date_of_birth': datetime.date(1996, 2, 10)}
{'id': UUID('5cd9ad59-fcf1-462c-8863-282a9fb693e4'), 'student_name': 'Marissa Barker', 'date_of_birth': datetime.date(1996, 10, 1)}
...

We can see the fields defined before GPA on the model are present in the values dictionary. But those defined after are not yet validated, and thus are not in the dictionary.

This presents a problem if we have some validation that is dependent on the values of other fields. We need to be sure that we have access to the other fields, but that may not be possible, depending on the field order.

Root validators are a solution to this. These run validation on the entire model's data, after the validator functions have run for each individual field.

Let's create an contrived example, and let's say we want to only accept students in the Science and Engineering department whose GPA is greater than or equal to 3.0.

We can use a root validator for this purpose, as below:

from pydantic import BaseModel, confloat, Field, validator, root_validator, ValidationError

class Student(BaseModel):
    ...
    
    @root_validator
    def validate_gpa_and_fees(cls, values):
        valid_gpa = values.get('GPA') >= 3.5
        fees_paid = values.get('fees_paid')
        if not (valid_gpa and fees_paid):
            raise ValueError("Invalid GPA/fees combination!")
        return values

The function is decorated with @root_validator, which means it runs after all the fields have been validated and converted to the correct types by Pydantic.

From there, we extract the values from the two fields - these are stored (along with all other field values) in the values argument, which is a dictionary. We can perform our logic checks from there!

We can test this out on the source data with the following code:

url = 'https://raw.githubusercontent.com/bugbytes-io/datasets/master/students_v2.json'
data = requests.get(url).json()
for student in data:
    try:
        model = Student(**student)
        print(f"GPA: {model.GPA}, Department: {model.department}")
    except ValidationError as e:
        print(e)

This will print out the GPA and department for students who meet the criteria, but will reject Science and Engineering students with a GPA of less than 3. Output is shown below:

GPA: 3.0, Department: Science and Engineering
1 validation error for Student
__root__
  Invalid GPA for Science & Engineering courses! (type=value_error)
GPA: 3.5, Department: Life Sciences
GPA: 3.23, Department: Arts and Humanities
GPA: 3.9, Department: Arts and Humanities

The second record fails our validation, but the rest pass. This matches our data source here (check the second record's GPA and department!).

So that covers root validators. Let's move on.

For a comparison with Django Form classes, a Pydantic validator on an individual field is similar to the clean_<fieldname>() method that's used to validate a particular field on a Django form. On the other hand, the root-validator is similar to the form's clean() method.

Pre-Validators

Sometimes, you want a function to run before Pydantic performs validations on a field. For example, you may want to join up the elements of a list into a string, or you may want to split a string into a list or another sequence. This must be done before Pydantic tries to validate the data type, otherwise you'll get a ValidationError.

Defining validation functions that run before normal validation can be done using the familiar @validator() decorator, by passing a pre=True keyword argument to the decorator.

Let's see an example now. We're going to use the data defined here.

We can see that we've added a new field to each student record - the tags field. This is a comma-separated string of tags that have been added to each student.

On our Pydantic model, we want the tags to be in a List, not a comma-separated string. So let's add the field to our model:

class Student(BaseModel):
    id: uuid.UUID
    student_name: str = Field(alias="name")
    date_of_birth: date = Field(default_factory=lambda: datetime.today().date())
    GPA: confloat(ge=0, le=4)
    course: Optional[str]
    department: DepartmentEnum
    fees_paid: bool
    modules: list[Module] = Field(default=[])
    tags: list[str]

On the final line, we've added the tags.

Now, for each Student, we need to convert the tag data coming from our external source from a comma-separated string to a list of values. We can try writing a normal validation function like so:

class Student(BaseModel):
    # other fields omitted...
    tags: list[str]

    @validator('tags')
    def split_tags(cls, value):
        return value.split(",")

This looks to do what we need - we split the string of tags by the comma, giving us back a list of the tags associated with the student record.

However, if we execute this, we get a ValidationError:

1 validation error for Student
tags
  value is not a valid list (type=type_error.list)

It turns out the Pydantic's default field validation occurs before custom @validator functions are called. This means that Pydantic will check the type of the raw data before running the validator, and will throw the error when it's unable to convert the string to a list.

Our split_tags() validator function, therefore, is never called!

This isn't convenient, as sometimes we want to mutate the source data before we perform type conversions and other basic validation.

Pydantic provides a workaround, though! We can pass the pre=True argument to our @validator() decorator. This will be called before the default field validation, giving us the chance to convert the comma-separated string to a list of strings, as expected by the tags field definition.

Let's redefine this function below:

class Student(BaseModel):
    # other fields here...
    tags: list[str]

    @validator('tags', pre=True)
    def split_tags(cls, value):
        return value.split(",")

Now, if we run the code, we should see that the model output contains a list of strings. Run the following code to verify.

url = "https://raw.githubusercontent.com/bugbytes-io/datasets/master/students_v3.json"
data = requests.get(url).json()
for student in data:
    try:
        model = Student(**student)
        print(model.tags)   # print out the list of tags to the terminal
    except ValidationError as e:
        print(e)

We should see the following output, based on our source data.

['motivated', ' skilled', ' hard-working']
1 validation error for Student
__root__
  Invalid GPA for Science & Engineering courses! (type=value_error)
['motivated', ' skilled', ' hard-working']
['hipster', ' lazy', ' slacker']
['erudite', ' clever', ' motivated']

Note that the validation error occurs due to our GPA/Department validation function from earlier. Also notice that the strings within the lists often contain leading whitespace. Let's fix that now.

There's a simple solution for this: we can add the anystr_strip_whitespace option to our model Config class, as below (bottom line of code):

class Student(BaseModel):
    # other fields omitted...
    tags: list[str]

    @validator('tags', pre=True)
    def split_tags(cls, value):
        return value.split(",")

    class Config:
        use_enum_values = True
        extra = "ignore"
        anystr_strip_whitespace = True

By making this true, the strings in each student's list of tags will have any leading/trailing whitespace trimmed off.

So that covers pre-validators, which can be used to perform manipulations that should run before the default Pydantic validation occurs for the given field, and also before your own custom validator functions.

Per-Item Validators

Let's see a final validation example. What happens if we need to apply a validation to each item in a sequence such as a list?

For this, we can make use of per-item validators - these will look at each item in a sequence of values, and run the logic of the function.

For example, let's say we want to reject students who have a tag of "slacker" in their list of tags. We can write a validator here that is applied to each element in the list of tags, and checks to see if the tag matches. Code for this is below:

class Student(BaseModel):
    # other fields omitted...
    tags: list[str]

    @validator('tags', pre=True)
    def split_tags(cls, value):
        return value.split(",")

    @validator('tags', each_item=True)
    def remove_slackers(cls, value):
        if value == 'slacker':
            raise ValueError("Student is a slacker and cannot be enrolled!")
        return value

The new validator is defined on lines 9-13. Notice the each_item=True argument to the decorator; this results in the body of the function being applied to each element of the tags list, for the given Student.

This will now throw a ValidationError if the student has this tag in their list.

Often, you will be dealing with lists, sets, dictionaries, and other data types that contain multiple values. It's very useful to know about this method of applying a function to the individual elements, rather than the whole object!

Summary

In this post, we've covered in more depth the validation options available in Pydantic.

We've seen how to use the @validator decorator, and how it can accept a list of previously validated fields (the values argument). We have also seen how to use root validators to apply validation logic to multiple fields, without relying on the field order.

In addition, we saw how to define pre-validators which run before Pydantic's default validator, and how to define validators that run on each item in a sequence such as a list.

If you enjoyed this post, please subscribe to our YouTube channel and follow us on Twitter to keep up with our new content!

Please also consider buying us a coffee, to encourage us to create more posts and videos!