Bug Bytes Web

Pydantic - Field function and Model Config

In this post, we'll dive deeper into Pydantic's features and learn how to customize fields using the Field() function. We can use this to set default values, to include/exclude fields from exported model outputs, to set aliases, and to customize the model's JSON schema output.

We'll also learn about model Config classes, which can be used to customize model-wide behavior.

This post follows from the previous post, and we will use truncated versions of the previous Pydantic models in this post, focusing only on specific fields for certain sections.

The source data for this post can be found here.

The associated video for this post is here:

Objectives

In this post, we will learn:

How to set default values with the Field() function
How to use aliases to allow model fields to have different names than the fields in the source data
How to include/exclude fields when exporting models, using both the Field() function and model export functions.
How to add titles and descriptions for fields in JSON Schema outputs, using the Field() function.
How to defined model Config classes to set model-wide configuration.

Exploring the Field() function

We've seen how to define Pydantic fields using types such as int, float, and date, as well as how to define optional/nullable fields and how to define union fields (where the type may be one of multiple values). We've also seen how to define constrained fields.

Pydantic offers an additional mechanism that can be used to define field information and validations - the Field() function.

In this section, we'll explore some of the things that can be done with this function.

Let's start with an example of how to use the Field() function to define a default value for a field.

We've seen how to do this before - with our Student model, we had a field called modules which was a list of Module objects, as below:

class Student(BaseModel):
    modules: list[Module] = []

The default value here is an empty list.

However, if you use the Field() function, the assignment will replace the default value (the empty list above), as we'll see. We can use the default keyword argument to the Field() function to set a default:

class Student(BaseModel):
    modules: list[Module] = Field(default=[])

This allows us to set the default value. On its own, this offers no benefit over the previous assignment of an empty list, but it is important to know how to set defaults with the Field() function if you're using it for other purposes, as we'll see in the remainder of this post.

One benefit is that we can use a similar keyword-argument called default_factory to set a field's default value to a dynamic value. For example, setting a date field to the current date, or setting a UUID field to a dynamically created UUID.

We can define a default factory for our student's date_of_birth field, and set the date of birth to the current date if none is provided in the source data. Note: logically this does not make sense, but let's roll with it and add it below using the default_factory keyword-argument:

class Student(BaseModel):
    date_of_birth: date = Field(default_factory=lambda: datetime.today().date())

The factory function we define here is a lambda function that takes no arguments, and returns the current date. You can define your own function to implement whatever logic you'd like for this default factory - it should return a suitable value that will serve as the default for the field!

Let's move on. So far, we've had Pydantic models whose field names matched the field names in the source data. For example, our class has a date_of_birth field, and that field (with the same name) also exists in the source data here.

But what happens if the name we want to give our field does not match the API/external data? Often, we do not want to use the same name.

Let's look at the student's name. In our class, we've created a name field of type string, as below:

class Student(BaseModel):
    name: str

This matches the key name in the source data here.

However, let's say we want to call this field student_name on our Pydantic model. By default, Pydantic will try and match based on the name of the field - it should also exist in the incoming data, with the same name. However, you can provide an alias, using the Field() function, as below:

class Student(BaseModel):
    student_name: str = Field(alias="name")

So here, our field name is student_name on the model, and we use Field(alias="name") to inform Pydantic that the name of the field in the data source is name.

So this will take the value of name in the data, and store it in the model's student_name field, whilst also performing any validations and data conversions that you define.

This is very handy if you need to map fields in the data you're working with to fields with different names in your Pydantic model.

The Field() function can also be used to define certain validation constraints, such as enforcing a number to be greater than a specific value. We can see a full list of options here in the Pydantic documentation.

Exporting Models - Advanced Usage

We saw in the first post how we can use the .dict() and .json() functions to export a model to a Python dict or a JSON string. By default, this will dump the entire object, i.e. all of its fields and values.

We can control the behavior of this functionality using the Field() function.

There are two additional keyword arguments to Field(), both of which default to False:

include - a boolean indicating that *only* this field should be included when calling model.dict() or model.json()
exclude - a boolean indicating that the field should be excluded when calling model.dict() or model.json()

Let's say we want to exclude the list of modules from the resulting dictionary or JSON. We can add the exclude=True keyword argument to the field, as below:

class Student(BaseModel):
    modules: list[Module] = Field(default=[], exclude=True)

We can then convert the first model from our retrieved data to a dictionary, as below:

model = Student(**data[0])
print(model.dict())

This gives the following output - note that the module-list has been excluded!

{'GPA': 3.0,
 'course': 'Computer Science',
 'date_of_birth': datetime.date(1995, 5, 25),
 'department': 'Science and Engineering',
 'fees_paid': False,
 'id': UUID('d15782d9-3d8f-4624-a88b-c8e836569df8'),
 'student_name': 'Eric Travis'}

We can add as many exclusions as we want by defining the Field() function on the relevant fields. For example, we might want to exclude the UUID, as it may be for internal use only.

class Student(BaseModel):
    id: uuid.UUID = Field(exclude=True)
    modules: list[Module] = Field(default=[], exclude=True)

The UUID would now also be excluded from the dictionary.

This way to exclude a field is useful for security-sensitive fields such as passwords, API keys, etc. However, when flexibly dumping data, you might not want to have to write Field() functions for each field.

Pydantic provides another way to exclude/include fields by passing the same keyword-arguments to the .dict() and .json() functions.

We could have achieved the above with the following code:

model = Student(**data[0])
print(model.dict(exclude={'id', 'modules'}))

We pass a set of the keys we want to exclude from the resulting dictionary. This also works with nested objects too - for example, if we want to exclude the id, but also exclude the registration_code from all the modules in the list, we could write the following code:

model = Student(**data[0])
exclude = {
    'id': True,
    'modules': {'__all__': {'registration_code'}}
}
print(model.dict(exclude=exclude))

This time, we define a dictionary for the fields we want to exclude. For nested objects, we define a set of fields to exclude, and for nested sequences (such as above), we also specify the index(es) to which we want to exclude the nested field.

The special value __all__ allows us to exclude the nested field from all elements of the sequence.

These are very useful features in Pydantic. We can control which fields should be excluded and included when converting our models to other data structures, and it's very easy to do.

Model Config Classes

Let's now see what Config classes can do in Pydantic models.

So far, we've seen how to customize individual fields. However, there are settings that can be applied across the entire Pydantic model. These can be defined in a special inner-class within the model, called Config.

Let's start with a simple example. Our Student model has a department field, whose type is set to a Python enum.

class DepartmentEnum(Enum):
    ARTS_AND_HUMANITIES = 'Arts and Humanities'
    LIFE_SCIENCES = 'Life Sciences'
    SCIENCE_AND_ENGINEERING = 'Science and Engineering'

class Student(BaseModel):
    department: DepartmentEnum

When we create a model from our data, the resulting value of the field is set to the raw enum. An example is shown below:

# fetch the raw JSON data from Github
url = 'https://raw.githubusercontent.com/bugbytes-io/datasets/master/students_v2.json'
data = requests.get(url).json()

# create model from the first element of the web-data
model = Student(**data[0])
print(model.department)

The output for the model's department field is set to DepartmentEnum.SCIENCE_AND_ENGINEERING. To get its raw string value, we would need to access the .value attribute of this enum field.

Rather than having to explicitly do this, you can use model Config classes to tell Pydantic that it should always output enum values, rather than the raw enum itself, as below (lines 11-12):

class DepartmentEnum(Enum):
    ARTS_AND_HUMANITIES = 'Arts and Humanities'
    LIFE_SCIENCES = 'Life Sciences'
    SCIENCE_AND_ENGINEERING = 'Science and Engineering'


# Pydantic model to outline structure/types of Students (including nested model)
class Student(BaseModel):
    department: DepartmentEnum

    class Config:
        use_enum_values = True

The use_enum_values field of the Config class performs this function. Now, if we run the same code as before, the output of model.department will be the raw string for the enum field: "Science and Engineering".

With this configuration, all enum fields in the class will output the raw value when accessing the field or dumping it to a dictionary with the model.dict() method.

Let's move on. Another useful field in the Config class is the extra field, which tells Pydantic how to behave when instantiating a model with extra fields that are not defined on the class.

The extra field can take on three values:

ignore - do nothing when encountering extra attributes.
allow - assign the extra attributes to the model
forbid - cause validation to fail with a ValidationError if extra attributes are passed to the model

To demonstrate this, let's remove a field from our Pydantic model. Let's say we remove the fees_paid boolean field (see the previous post), and have the following models in our application:

# define an Enum of acceptable Department values
class DepartmentEnum(Enum):
    ARTS_AND_HUMANITIES = 'Arts and Humanities'
    LIFE_SCIENCES = 'Life Sciences'
    SCIENCE_AND_ENGINEERING = 'Science and Engineering'


# Pydantic model to outline structure/types of Modules
class Module(BaseModel):
    id: Union[uuid.UUID, int]
    name: str
    professor: str
    credits: Literal[10,20]
    registration_code: str


# Pydantic model to outline structure/types of Students (including nested model)
class Student(BaseModel):
    id: uuid.UUID
    student_name: str = Field(alias="name")
    date_of_birth: date = Field(default_factory=lambda: datetime.today().date())
    GPA: confloat(ge=0, le=4)
    course: Optional[str]
    department: DepartmentEnum
    modules: list[Module] = Field(default=[])

    class Config:
        use_enum_values = True

Now, our model is missing a field that's defined in our incoming data here. So the question is: what should the model do when we instantiate it and pass this field that it does not know about?

Let's start by adding the extra key to our Config class, and setting it to ignore (which is the default):

class Student(BaseModel):
    id: uuid.UUID
    student_name: str = Field(alias="name")
    date_of_birth: date = Field(default_factory=lambda: datetime.today().date())
    GPA: confloat(ge=0, le=4)
    course: Optional[str]
    department: DepartmentEnum
    modules: list[Module] = Field(default=[])

    class Config:
        use_enum_values = True
        extra = 'ignore'

Now, any extra attributes that are provided to the Pydantic model are silently ignored.

Often, we might want to explicitly forbid extra data being set on our model and potentially serialized later in the workflow. Imagine a rogue password or API key, for example. We can use extra='forbid' to achieve this.

On the other hand, we might want to be flexible with our data model, and may need to accept other attributes that are not explicitly defined in the Pydantic model. We can use extra='allow' for this.

This attribute is useful, but there are many others that you can define within a Config class - for example, the anystr_strip_whitespace attribute that will handle stripping rogue whitespace from incoming string/byte data.

For a list of all available Config class fields, see the Pydantic documentation here.

Summary

In this post, we've covered some more useful Pydantic concepts. We've seen how to use the Field() function to set defaults, including using the default_factory keyword-argument to set a dynamic default value. We've also seen how to use the alias keyword-argument to handle the case where fields on your model have different names than the incoming source data.

We looked at more advanced uses of the model.dict() and model.json() functions, used to export Pydantic models to dictionaries and JSON strings, respectively. We saw how to exclude certain fields from the output, and saw that there's also a mechanism for specifying which fields to include (only).

Finally, we learned about the model Config inner-class, which can be used to set model-wide configuration. This can be used to perform cleanup of data - for example, stripping whitespace from all string/byte data - and also to transform values in the class, for example by setting enum types to their raw values. We also learned about the important extra attribute, which controls what Pydantic models do when they encounter attributes that are not explicitly defined in the model class.

In the next post, we'll dive deeper into model validation techniques.

If you enjoyed this post, please subscribe to our YouTube channel and follow us on Twitter to keep up with our new content!

Please also consider buying us a coffee, to encourage us to create more posts and videos!