Java Ecosystem, Kotlin, Distributed Systems, Sociology of Software Development

Kotlin and MongoDB, a Perfect Match

Posted on Sep 25, 2018

MongoDB’s dynamic schema is powerful and challenging at the same time. In Java, a common approach is to use a object-document mapper to make the schema explicit in the application layer. Kotlin takes this approach even further by providing additional safety and conciseness. This post shows how the development with MongoDB can benefit from Kotlin and which patterns turned out to be useful in practice. We’ll also cover best practices for coding and schema design.

Kotlin and MongoDB, a Perfect Match

TL;DR

  • MongoDB’s dynamic schema is powerful, but it can lead to more mistakes. In order to maintain safety, it’s even more important to make the schema explicit and enforced in the application layer by:
    • Using a statically typed language
    • Using object-document mapping (ODM)
  • Kotlin nicely completes this approach:
    • Null-aware types to model optional fields
    • Powerful means to easily handle null/optional fields
    • Immutable properties. So we can’t forget to provide required fields.
    • Increased awareness in the team for optional fields
    • Data classes to easily create immutable and maintainable data structures for the object-document-mapper
    • Coroutines enable using the non-blocking driver without fiddling with callbacks
    • The existing object-document mappers are seamlessly supporting immutable data classes
  • Best Practices:
    • When adding or removing a field it’s usually better to update all documents in order to avoid nullable types.
    • Use tailored data classes for projections.
    • Wrap optional fields in an additional optional data class. This way, the fields within the data class can be non-nullable.
    • Nested objects with constants are useful to intuitively access the document’s fields.

The Status Quo of Dealing with MongoDB’s Dynamic Schema in Java

MongoDB is not schemaless. There is always a schema, but it’s dynamic and not enforced by the database. This makes MongoDB very flexible and adaptive. However, there is the danger of losing track of the schema and its variations. We may end up in “field undefined” errors, error-prone string concatenations, wrong types, and typos. What’s the solution?

A statically typed language and object mapping.

We map the documents to objects which are instances of a class with a statically defined structure. This way, the compiler guides us during reading and writing the values. This approach provides a (type-)safe way of using MongoDB and makes the schema explicit, enforced, (kind of) validated and documented in the application layer.

So that’s all great but nothing new. We’re already using this approach in Java for a while now. However, Kotlin can extend and improve this approach even further!

Nullability

Null-Aware Types

Even when we use Java and object mapper, we still have the challenge to keep track of the optional fields. Especially after the database model has grown over some years and same team member have left and a new one joined.

// Java
public class Design {
    private String name; // can name be null? 
    private Statistics statistics; // can statistics be null?

    // getter & setter boilerplate
}

It’s likely to run into the infamous NullPointerExceptions:

int likes = design.getStatistics().getLikeCount() // NPE because statistics is null in some cases! 

Annotations like @Nullable help here, but they are not enforced by the compiler and maintaining them is easy to forget.

The point is that Java’s type system doesn’t distinguish between nullable and non-nullable types. Fortunately, Kotlin does. We just have to add a ? behind the type.

// Kotlin
data class Design(
    val name: String, // name can never be null
    val statistics: Statistics? // statistics are optional/nullable
)

Hence, our Kotlin classes also document which fields can be null and which not. And even better: The compiler enforces the handling of null values before we can access the value. This avoids the annoying NullPointerExceptions.

val likes = design.statistics.likeCount // Compile Error! 
// We are not allowed to access likeCount directly, because statistics can be null.

Another point: Let’s assume that the field name of a Design document is used everywhere in the code base. Let’s further assume that we decide to make this field optional. In Kotlin, we just add a ? and the compiler points us to every access to the name property that has to be adjusted in order to handle null values. This is so powerful.

For me, Kotlin’s nullability is the missing part for making MongoDB’s dynamic schema explicit and documented in the application code. Besides, it significantly improves the safety.

Powerful Means for Handling Optional Fields

Let’s assume we are aware that a field can be null. Then, it’s still cumbersome in Java to do the actual null checking.

// Java
int likes;
if (design != null && design.getStatistics() != null) {
    likes = design.getStatistics().getLikeCount();
} else {
    like = 0;
}

These nested null-checks are easy to forget. Fortunately, Kotlin has powerful and concise means to handle nullable fields.

val likes = design?.statistics?.likeCount ?: 0

Enforce Required Fields with Immutable Properties

If we stricly use immutable properties in data classes and object mapper, we have another benefit: We can’t forget to set a required (= non-nullable) field.

val newDesign = Design() // compile error! The non-nullable property `name` is missing.
mongoTemplate.insert(newDesign)

This removes a really big source of errors and keeps the schema consistent.

Increased Awareness for Optional Fields

“Ok, we have to add a statistics field in the design document”

“Can the field be null?”

After using the combo MongoDB + Kotlin for a while, we discovered the following effect in our team: Every time we introduce a new field, we automatically start discussing its nullability. It’s impressive. The type system forces us to decide on the nullability at the moment we add the property to the corresponding data class. That shaped our awareness of the required and optional fields and becomes one of the first questions we ask when it comes to new fields.

Data Classes

Powerful Data Structure Definition

The strict usage of object-document mapper requires you to write a corresponding class for each document. Fortunately, that’s where Kotlin’s data classes are a huge relief in contrast to Java.

data class Design(
    val name: String, 
    val dateCreated: Instant, 
    val statistics: Statistics?
)
data class Statistics(
    val likeCount: Int,
    val dislikeCount: Int
)

That’s it. I can’t imagine how the definition of data structures could be more concise. Additional benefits are:

  • High safety due to immutable properties.
  • hashCode(), equals() and toString() are generated and (more important) don’t have to be maintained.
  • Readability due to named arguments.

Kotlin-Friendly Object-Document-Mapper

The existing object-document mapper for MongoDB works nicely with Kotlin’s immutable data classes. We should not take that for granted. Just take a look at Hibernate and the pull-ups that are required to make it work together with Kotlin. Still, it’s not possible to benefit from data classes.

Here’s an overview over some ODMs:

  • We mainly use Spring Data MongoDB in our projects. We use the more low-level MongoTemplate most of the time.
  • The lightweight and Kotlin-native KMongo also looks pretty promising. But I haven’t used it in production yet. Check it out.

Side note: Since we have less impedance mismatch between the object-oriented and the document world, the object-document mapping is much simpler (compared to ORMs). Less complexity usually leads to less trouble. In fact, we never had any mapping issue or debugging sessions in our mapping framework (I’m looking at you, Hibernate!). It just works out of the box.

Tailored Data Classes for Projections

In many situations, we only need some fields and not the whole document. Fetching the whole document and mapping it to the complete data class leads to higher query times and a waste of memory. Fortunately, Kotlin makes it easy to define tailored data classes for that query:

import org.springframework.data.mongodb.core.mapping.Document
import org.springframework.data.mongodb.core.mapping.Field

@Document(collection = "designs")    
data class DesignWithLikeCount(
    val name: String,
    @Field("statistics.likeCount")
    val likeCount: Int
)

Now, we only have to add the required projection to the query and we are done.

// Usage: projection and mapping to a tailored class
query.fields()
    .include("name")
    .include("statistics.likeCount")
val entities = mongoTemplate.find(query, DesignWithLikeCount::class.java)

The above snippets are using plain strings to refer to the fields. We’ll talk about better approaches in a later section.

Schema Design

Avoid Nullability with Nesting

We should pay attention to nullability when it comes to the schema design. An example:

data class Design(
    val name: String, 
    val likeCount: Int?,
    val vectorDesign: Bool?,
    val dislikeCount: Int?
)

There are two things wrong here:

  • First, there are many nullable fields. We should strive to reduce nullability in our schema in order to make it simpler and less error-prone.
  • Second, this schema doesn’t tell us, if there are fields that belong semantically together. Just look at the field likeCount and dislikeCount. Let’s assume that these fields are either set together or none of them. This is not obvious by looking at the schema/data class.

A solution is to wrap that group of fields together into a new data class Statistics:

data class Design(
    val name: String, 
    val vectorDesign: Bool?,
    val statistics: Statistics?
)
data class Statistics(
    val likeCount: Int, // non-nullable
    val dislikeCount: Int // non-nullable
)

This way, the schema states clearly: If a statistics field exists, all of its subfields are never null.

Side note: By the way, I like the impact that object mapping has on the schema design. It impedes a highly variable schema (many nullable fields or complete arbitrary field names), which in turn is harder to understand, more error-prone and harder to process. My rule of thumb: If we can’t easily map a schema to classes, it’s usually not a good one.

New Fields: Migration instead of Nullable Types

Let’s assume we have to add the new field dateCreated to the Design document. Is this field nullable or not? At the first glance, yes, because at the time of releasing the new application version (that starts writing this field) none of the existing documents have this field.

data class Design(
    val name: String, 
    // is the new field `dateCreated` nullable? 
    val dateCreated: Instant? 
    // Well, there are existing documents without this field...
)

In MongoDB, we often change the schema because we have to align it to the changing access pattern of the application. However, this may lead to a schema consisting of many nullable fields because some documents fulfill the new schema and some the old.

We don’t know if dateCreated is nullable/optional by design or because it was added later (being a side-effect of a changed schema). That makes the schema ambiguous and harder to grasp. That’s why I strongly recommend performing a real schema migration by setting this field in every document via a dedicated migration script.

db.designs.updateMany({}, { $set: { "dateCreated": ISODate() } })

So we can safely mark the new field dateCreated as non-nullable. The bottom line is: Consequently and permanently clean up your schema. Make it as consistent as possible by doing complete schema migrations. Try to use nullable fields by design, not for schema changes.

Misc

Utilize Nested objects with Constants for Field Names

We usually use Spring’s MongoTemplate for the queries. In this case, we have to refer to the field name.

val query = Query().addCriteria(Criteria.where("statistics.likeCount").gt(10))
val designs = mongoTemplate.find(query, Design::class.java)

Fiddling around with strings like "statistics.likeCount" is error-prone. Using constants is an obvious solution. But how can we arrange the constants so they reflect the nesting of our document? With nested objects.

object DesignFields {
    const val NAME = "name"
    object Statistics {
        const val SELF = "statistics"
        const val LIKE_COUNT = "$SELF.likeCount"
    }
}

Usage:

// typesafe and auto-completion-friendly 
Query().addCriteria(Criteria.where(DesignFields.Statistics.LIKE_COUNT).gt(10))

That works quite well. However, this approach has one drawback: If a field changes, you have to maintain two points: The data classes and the field constants. It’s easy to forget to update the string constants after you change a property of the data class. Fortunately, we can utilize Kotlin’s reflection API and refer to the properties of the data classes.

object DesignFields {
    val name = Design::name.name // "name"
    object Statistics {
        val self = Design::statistics.name // "statistics"
        val likeCount = "$self.${Statistics::likeCount.name}" // "statistics.likeCount"
    }
}

Mind that we can’t use the const keyword here because the properties are no compile-time constants. Hence, we can’t use this “constants” for annotation values like Spring’s @Field, which are used for tailored projections.

Another alternative can be KMongo, which supports typed queries by directly using the properties in the query:

col.findOne(Design::name eq "Cat")
col.findOne(Design::statistics / Statistics::likeCount gt 10)
// or using the experimental annotation processing
import Design_.Statistics
col.findOne(Statistics.likeCount gt 10)

Coroutines: Asynchronous Driver Without Callbacks

The asynchronous MongoDB driver allows the efficient usage of threads. However, this usually leads to a more complex code because we have to deal with callbacks or stream-based APIs. Fortunatelly, KMongo provides a coroutine-based wrapper around MongoDB’s asynchronous driver. So we can write code that looks synchronous but is asynchronous and non-blocking under the hood.

Don’t Use Fongo for Testing

This point is not Kotlin related, but a matter close to my heart. I highly recommend using a real MongoDB for your tests instead of the in-memory database Fongo. That’s a breeze with Testcontainers.