Hidden Fundamental of MongoDB: The Core of High-Performance Data Retrieval

Gotcha! This article is another clickbait damn it. Hold it right there. Before you put any more time into reading this article, let’s first make sure we are on the same page. Read the following questions…

Did you know that MongoDB provide a way to process query results without having to load the entire dataset into memory at once?
Have you ever wondered how MongoDB’s cursor timeout feature helps ensure resource efficiency and prevents server overload?
Have you ever wonder how real-time data streaming platforms like Debezium achieve seamless integration with MongoDB?
Hint: MongoDB’s Oplog cursor plays a key role in capturing real-time changes made to the database.

If these questions make your brain go…

Demon Slayer Kimetsu No Yaiba GIFfrom Demon Slayer GIFs

Then I’m here with this article about What is cursor in mongoDB. So, grab a coffee and put demon slayer on TV and see how cute is Nezuko…

Cough Cough back to the topic.

The crucial basics of MongoDB that you may not have bothered to learn

Note: Pay close attention here as I explain the mongodb cursor.

MongoDB document says:

A cursor is pointer to the result set of a query. Clients can iterate through a cursor to retrieve results. By default, cursors timeout after 10 minutes of inactivity.

There you go, folks. It’s a pointer to the result set of a query and that’s answers your questions, let’s move on…

Slap Jjk GIFfrom Slap GIFs

Okay! Okay! HOLD UP!! I apologize!

Let me explain the cursor and its use in MongoDB more clearly. And I get it now that you are hungry for some real answers.

Seriously what is cursor in MongoDB?

In MongoDB, when you perform a query, the results are returned as a cursor. A cursor is a pointer that points to the result set of the query on the server-side. It doesn’t contain the actual data but acts as a reference to the documents that match your query.

The cursor object provides some methods to iterate over the documents, fetch batches of documents, or just fetch the next document. When you start iterating over the cursor, MongoDB will fetch batches of documents from the server as needed, this allowing MongoDB to efficiently handle large result sets without loading all the data into memory at once.

I hope you just got a little context about the mongodb cursor. But I know again your brain goes…

Faint Anime GIFfrom Faint GIFs

Then I can save you with the step-by-step explanation of how a cursor works-

Querying: When you execute a query using collection.find() in MongoDB driver, MongoDB will return a cursor object representing the result set that matches the query.
Lazy Loading: The actual documents are not returned immediately. Instead, the documents are fetched from the server in small batches as you start iterating over the cursor.
Iteration: You can use cursor methods like cursor.forEach() or other iteration methods like cursor.next(), cursor.toArray() to ****process the documents one by one or in batches.
Closing: Once you have iterated through all the documents or have explicitly closed the cursor using cursor.close(), the cursor is automatically closed, and any resources associated with it are released.

I Know this explanation begs the question, So fire it

Naruto Boruto GIFfrom Naruto GIFs

I don’t understand it, when I hit a find query in mongodb. It return all the data to me in an array. So, I already got the data in array and I can easily loop through array?

You are correct that when you execute a find query in MongoDB using the official MongoDB driver. The driver automatically converts the cursor to an array by calling the .find().toArray() method.
The cursor.toArray() method is used to return an array that contains all documents returned by the cursor. Internally this iterate all the cursor data using cursor.hasNext()

After firing a find query, do I need to wait for the scan to complete before a cursor is created, and then I can iterate over the results?

Yes, You need to wait for the server to identify the first batch size of documents. If you are doing, a query that requires sorting the entire collection, then the collection must be completely iterated before you get the first document.
If you are doing a query that doesn’t require sorting it may return the first batch before all matching the documents are visited by the server.

Example of MongoDB Cursor in Action: How Data Retrieval Works

Let’s create sample data. The dataset contains 100 documents, each with an “index” field representing a number from 1 to 100.

var docs = [];

for (let i = 1; i <= 100; i++) {
	docs.push({ index: NumberInt(i) })
}
db.myCollection.insertMany(docs)

/** 
{
	"_id" : ObjectId("5ad24fe286ac9fc7b5c4bbd8"),
	"index": 1
},
...
...
**/

When you hit the find() method, mongodb server just returns the cursor to the client. (No document at this moment)

A starry night sky.

When the client receive this cursor, it start requesting the batch of documents from server until cursor (or data) get exhausted. The cursor get exhausted when no document is left.
Some points to be remember.
1. MongoDB shell iterator size is 20 documents.
2. Initial batch size is 101 documents or 1MB.

Scenarios Where Cursor Usage Becomes Useful

Large Result Sets: When the query result contains a huge number of documents, loading them all into memory at once might lead to performance issues or even out-of-memory errors. Using a cursor, you can process documents one at a time or in smaller batches, keeping memory consumption under control.
Streaming Data: If you are continuously receiving data from the database (e.g., real-time data streams), using a cursor with a tailable or awaitable cursor type allows you to listen for new documents as they are inserted or modified, providing a real-time streaming experience.

So to summarise, for most common use cases with relatively small result sets, using an array is perfectly fine. However, if you anticipate dealing with large datasets or have specific use cases like real-time streaming or complex aggregations, using a cursor can be beneficial for more efficient data processing.