MongoDB Indexing – Testclue

Indexes

Fundamentally, indexes in MongoDB are similar to indexes in other database systems. Consequently, much of the material available on indexing for MySQL, Oracle, SQL Server, et al. will apply in pretty much the same way to MongoDB.

MongoDB defines indexes at the collection level and supports indexes on any field or sub-field of documents in a MongoDB collection.

As with RDBMSs, indexes support the efficient execution of queries. Without index support, MongoDB would have to perform a collection scan – i.e. scan every document in the collection – to select those documents that correspond with the query. However, if a suitable index exists for a query, MongoDB can make use of it to limit the number of documents that need retrieving.

MongoDB indexes use a B-tree structure to store a small portion of the collection’s data. This structure is easily traversed meaning that matches are quickly found. The precise ordering of the index entries can support efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering defined in the index. Note the use of the work ‘can’; indexes need to be properly designed to support your application’s queries.

The Default _id Index

MongoDB automatically creates a unique index on the _id field when you create a collection. This index prevents clients from inserting two documents having the same _id value. MongoDB will not allow you to drop the index on the _id field.

Types of Index

MongoDB provides a number of different index types to support specific types of data and queries. As is often the case with MongoDB, there are no hard and fast rules here; you need to understand how your application is querying a collection and use the best-adapted index type to support it.

The Single Field Index

In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.

Here’s a simple example. Consider this document:

Consider a collection named records that holds documents that resemble the following sample document:

{
 "_id": ObjectId("570c04a4ad654577f97dc789"),
 "score": 10,
 "location": { state: "CA", city: "Los Angeles" }
}

The following command will create an ascending index – values sorted from low to high – on the score field:

db.records.createIndex( { score: 1 } )

The following command creates a decending index – values sorted in reverse order from high to low- on the city field:

db.records.createIndex( { location.city: -1 } )

Please note, however, that for single-field indexes and query sort operations, it makes no difference whether the index is ascending (1) or descending (-1) because MongoDB can navigate the index in both directions – i.e. up or down.

The Compound Index

MongoDB supports indexes on multiple fields, known as compound indexes.

In a compound index care has to be taken over the ordinal position of the fields and each one’s sort order. For instance, if a compound index consists of { username: 1, date: -1 }, the index sorts first by ascending username and then, within each username value, sorts by descending date.

The ordinal position and sort order of compound indexes determines whether an index can support a sort operation. Consider the following index on the registrant collection:

db.registrant.createIndex( { "username" : 1, "date" : -1 } )

Believe it or not, it can support both of these sort operations:

db.registrant.find().sort( { username: 1, date: -1 } )

db.registrant.find().sort( { username: -1, date: 1 } )

Remember, MongoDB can navigate the index in both directions. So in each query, date is navigated in the correct direction. However, it can’t support the following query because it attempts to traverse date in the wrong direction relative to username:

db.registrant.find().sort( { username: 1, date: 1 }

The Multikey Index

MongoDB uses what it calls multikey indexes to index the content stored in arrays. If you index a field that holds an array value – whether that’s a native data type or another document – MongoDB creates separate index entries for each element of the array.

MongoDB itself determines when to create a multikey index; you don’t need to be explicit about this.

Text Indexes

MongoDB provides a text index type that supports searching for string content in a collection. These text indexes do not store stop words (e.g. “the”, “a”, “or”) and stem the words in a collection to only store root words. Stop and stem words are language-specific.

This allows you to do execute queries against unstructured data. For example, you can search for articles that contain the words coffee but do not contain the term shop:

db.articles.find( { $text: { $search: "coffee -shop" } } )

It’s a large topic in itself so please head over to Text Indexes in the MongoDB documentation for more information on this.

The Hashed Index

A hash function is any function that can be used to map data of arbitrary size to data of fixed size. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.

So, in support of support hash based sharding, MongoDB provides a hashed index type. A hash creates a random distribution of values along a given data range. Consequently, only equality matches are supported. For the avoidance of any doubt, range-based queries are not supported.

To Index, or Not to Index

Now here’s a thing: there are situations when it can be better not to index. Remember, just like an RDBMS, when an index is used, two reads are required: first on the index and then random disk reads on the corresponding documents in the collection. So indexes lose their efficiency as you fetch larger percentages of a collection.

After a certain percentage threshold has been reached, it can be more efficient to ignore an index and simply do a full collection scan; more of a sequential disk read. Unfortunately, there isn’t a hard-and-fast rule to tell you when to stop using an index.

It depends on a number of factors: the size of your data, indexes, and documents, and the average size of your result sets. But if a query is returning more than 30% of a collection, start finding out whether index scans or collection scans perform better.

When thinking about indexes, your primary concern is no doubt about query performance. However, indexes have to be updated. Each time a document is inserted or updated, corresponding indexes also need to be updated. The same goes when a document is removed. This index overhead impacts overall update performance.

Conclusion

Like everything else in the NoSQL world, there is no silver bullet to achieving good performance. MongoDB is no exception. Start by knowing your use cases and focus your design to support them, including creating indexes for your most frequently used queries.

But it doesn’t stop there. Monitor performance and be prepared to change your indexing strategy as things change.

Table of Contents

Indexes

The Default _id Index

Types of Index

The Single Field Index

The Compound Index

The Multikey Index

Text Indexes

The Hashed Index

To Index, or Not to Index

Conclusion

Looking for something?

Upcoming Webinars

Table of Contents

Indexes

The Default _id Index

Types of Index

The Single Field Index

The Compound Index

The Multikey Index

Text Indexes

The Hashed Index

To Index, or Not to Index

Conclusion

Looking for something?

Upcoming Webinars

Tags