$group in MongoDB Aggregation with examples

In this post I will be discussing about $group in MongoDB Aggregation framework. I would take on some examples on $group and various accumulators (operators used along with $group) like $push, $avg, $sum, $min, $max and $addToSet

$group

$group simply groups the input documents based on specified fields and for each distinct group, it outputs the result document. Response will always have unique _id.

Here is an illustration to better understand the concept.

MongoDB $group concept

You can also have computed fields in $group output. These values will be computed inside the group. If you want to compute a stat across the collection, then you can group by _id : null. We would take an example later.

Let’s first take a simple example of grouping.

Suppose we have a Game database, where we have a collection for storing user info called as users and there is another collection where we store game scores called as game_scores. We are having the following sample document

Sample Document form game_scores collection

Now if we wanted to find out all the unique users who played the game. How would we do code it?

Solution – $group.

Simple $group stage in MongoDB Aggregation Pipeline

You see that in _id param, we pass $user, because we only want unique users( grouping by user field only), $ denotes that user field is from from input documents.

On the output result, you can see there are 4 unique users who played the game.

Now that you understood the basic concept, let’s move on to accumulators in $group stage.

Accumulator operators in $group

Accumulator operators are essentially the functions which are applied to fields of input documents other than fields which are “grouped” under “_id”.

In other words, only fields which you can output in $group other than _id, should be accumulated using accumulator operator.

All the following sections of this post will give you a fair idea of accumulator operators in $group.

$push

$push operator pushes the fields in the expression into an array. You will get more clarity with the example.

Let’s continue with the example above. What if we wanted to get all the scores of each unique gamer in the array.

$push in $group stage example
$push in $group stage example

You see that scores is the new field which is array with all the $score fields of grouped documents.

$avg

$avg calculates the mathematical average of the field mentioned in accumulator expression.

It ignores the non-numerical values.

A Simple Example

What is the average score of each user in the game.

example of $avg accumulator in $group aggregation stage
A simple example of $avg accumulator in $group aggregation stage

A little complex example πŸ™‚

Suppose, we wanted to get average game score of each user in each of the gameMode.

What do we do?

Just add gameMode in $group.

It’s not that tough πŸ˜‰

Example 2 for $avg in $group
Example 2 for $avg in $group

Notice _id has now changed to an object (this is because now we are grouping with 2 fields not 1). Here is how the output looks like.

example output for $group by 2 fields
example output for $group by 2 fields

Just observe the first 3 documents in this image. You see the user field is repeated with different gameMode value. So now you have average score in each difficulty level for each user.

PS: I have truncated the output in this example for simplicity πŸ™‚

$sum

$sum is very similar to $avg. Only difference being $sum calculates the sum (as the name suggests already)

Example for $sum

Suppose, we wanted to calculate number of times the user has played in each gameMode.

$sum in $group example
$sum in $group example

Pretty easy right? Same is possible using $count. But I wanted to demonstrate $sum.

$min

$min calculates the minimum value of non-grouped fields in a grouped set of documents.

Example

How to calculate minimum score of each gamer?

$min in $group example code and output
$min in $group example code and output

$max

$max calculates the maximum value of non-grouped fields in a grouped set of documents.

Example

Very similar to previous one, how do we calculate highest score for each gamer in each gameMode.

$max operator in $group example
$max operator in $group example

$first

$first returns the first expression result in the group of documents that share same $group by key.

This would not be a very meaningful operator if there is no $sort stage before the $group stage. I think you need an example to get more clarification.

Example

Suppose, we want to find out first score of each gamer in our database.

We will sort the game_scores in ascending order of playedOn field. Then we will group by user and use $first operator. Here’s how.

$first accumulator operator in $group example
$first accumulator operator in $group example

If you don’t use $sort as the first stage of the pipeline, then $first will not return any meaningful result

$last

This is very similar to $first and does the exact opposite.

As you might have already guessed, $last returns the last expression result in the group of documents that share the same $group by key.

Example, How to find the latest score of each gamer in our database.

$last accumulator function example in $group

Alternate solution could have been sorting records based on playedOn date in decreasing order and then using $first. You have the choice, but I wanted to demonstrate $last.

$addToSet

$addToSet = $push + only have unique values in the resulting array.

Basically what $addToSet does is, it takes the group of documents with same $group by _id. And then applies the expression given as an argument (to $addToSet) on each of the document in group, and the returns the array with unique results.

Let’s take a quick example

From our game_score collection, what if we needed to find, which game modes did each user play.

There can be some users who only play in difficult mode, others might play only in easy and medium mode. Let’s find out with code.

$addToSet example

Let me know if you still have any doubts in any of these.

Conclusion

In this post, I discussed some basic concepts around $group which is used in the MongoDB Aggregation Pipeline. I hope the examples were simple enough. Let me know if you have any doubts in comments or Tweet me at @MohitSehgl

Be in touch, will see you in next post. Happy Coding πŸ™‚

Leave a Comment

$group in MongoDB Aggregation with examples

by Mohit Sehgal time to read: 4 min
0