MongoDB Data Modeling Patterns — Subset Pattern

Shashi Prakash Gautam
2 min readApr 8, 2020

The Subset pattern addresses the problem with large documents which contains lots of data but not all the data is used by the application. If we pull large documents into the memory chances are we may encounter working set that exceeds RAM, resulting in information being removed from memory.

Let's take an example of articles collections with lots of comments:

{
"title": "Title",
"body": "Body",
"comments": [
{
"id": 1,
"author": "Alex",
"comment": "Nice"
},
{
"id": 2,
"author": "Bob",
"comment": "Nice"
},
...
]
}

If we pull this document in memory it can exceed the memory.

We can use the subset pattern and split our article collections into two collections of articles and comments. In our articles collections, we will keep only a subset of comments, recent comments and the rest of the comments will keep in the comments collections.

// Articles
{
"title": "Title",
"body": "Body",
"comments": [
{
"id": 1,
"author": "Alex",
"comment": "Nice"
},
{
"id": 2,
"author": "Bob",
"comment": "Nice"
}
]
}

and comments collections

// comments
[
{
"id": 1,
"articleId": 123,
"author": "Alex",
"comment": "Nice"
},
{
"id": 2,
"articleId": 123,
"author": "Bob",
"comment": "Nice"
}
]

Now we can fetch the article with recent comments and most of the time we only need a subset of comments not all the list of comments. If we want the full list of comments we can query it from comments collections.

Rule of thumb for splitting the data is to that the most used part of the document should go into the “main” collection and the less frequently used data into another. For our example, that split is recent comments.

In the subsequent, article we will visit some more MongoDB data modelling patterns.

Thanks for reading. If you have some feedback, please provide your response or reach out to me on Twitter or Github.

Happy Coding!!!

--

--