Mongodb : Types of Indexes and application
By default Mongodb create a index which is stored on Btree structure.
there are other options available to create index more application specific
1.) Covering Index: There is a subset of queries that do not require access document at all.
this happens when all the information in the query result is contained in the index itself. when this magic alignment happens query can be magical fast.
A query that only queries fields and returns fields contains in the index is covered by the index.
Example:
Consider the below messages collection, it has the index on the key "from.country, from.number & time". so this means index has the document value for these 3 fields already on the index.
if there is a query which only required "time" or "from.country" or "from.number" or all 3, in such cased the number of documents scanned will be zero. see below.
in below example lets exclude _ID
query example: db.messages.find({'from.country':'44'},{_id:0,time:1})
Explain example: db.messages.find({'from.country':'44'},{_id:0,time:1}).explain('executionStats')
Above stats show that mongo did not scanned any documents to get the results.
If your use case has any query that can be covered, this can provide huge performance benefit.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2.) Sparse index: Indexes are best when they are compact. the smaller they are the faster they can be loaded and processed. Regular mongo indexes hold an entry for every document, if a document happens to have a null value or null fields at all. the index will still include those documents under the null key. this can be a waste. if only a small portion of your documents have that fields at all. Remember mongo, has no predefined schema and it allows documents to contain arbitrarily different fields from document to document in the same collection.
Consider my case, i have a messages database, i have only a few documents tat are associated to some promotion that have promo fields.
Refer case above, where i have only 188 promo field out of my 10000 documents. So that a good candidate for sparse index. A sparse index will only hold entry to doc if the doc actually contains the field.
Lets create a sparse index
db.messages.ensureIndex({promo:1},{sparse:true}
a sparse index is a optimization over a non-sparse index because the index size can be dramtically smaller if only only a small subset of doc contain that field.
Lets see the collection stat
As you see above the size of promo index is very small than the other indexes. Sparse index can be very helpful if index size is in MB's or GB's.
Note: there are some limitation of sparse index to decide carefully. Example sort operation can use index but if index is sparse, it means index doesn't have every document, so your result list will be shorter than it it should be. Sort should only order documents not drop ones that not have value.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3.) Unique Index: Ensure no other document contains the same field value. Enforces uniqueness within a single collection, not across shard. Exmaple. my telephone company customers, each have a unique phone number.
Each customer have a phone number, lets ensure it is unique and remain unique by adding a unique index.
Prob 1:If you try to add a customer who have the same mobile number as an existing customer. you will get an error as
"ermsg":"E11000 duplicate key error index"
Prob 2: Even if you try to crreate a customer without any phone number, first entry will be suceeded but anoterh customer without a phoen number will give same error.
"ermsg":"E11000 duplicate key error index"
Solution !!!!!!!!!!!!!!!! HMMM how about we try to create an index which is unique and sparse. WHY?? Since the sparse index will not have the entries for documents with no phone number. Lets try that..
Lets drop and recreate the index
We already have a customer will no phone, lets try to add other without phone mumber
WOW.It worked. We have 2 documents with no number. So thats what we wnated, if you have a number that better be unique. If you dont have a number, thats fine, and we can have multiple people with no number.
What if a customer have more that one phone number. this same idex will work fine. Instead of single value field for number, it can be in array and mongo will still ensure that no number is duplicate.
I will add a new customer and his number, but number in array. then try to push an existing numbers. lets see
An when i try to push a unique number, it it added successfully. See the result below.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
TTL Index: it means time to live index. Data is great and we would like to keep data forever. Well most data, and well not forever ever. Just as long as necessary. So how do we get rid of old data. traditionally we 'd have some timestamp on the some rows in the DB , then run batch job on timer outside of db to delete rows in table older than some threshold.
Mongo has a convenient way to do this without running external job. it is call TTL index.
TTL: is property on an index that defines how long document is allowed to live. after that the documnet will be subject to automatiuc removal by mongo.
Example, let create in index, to messages older than 90 days should not be kept.
expireAfterSeconds is expressed in relative terms. it doesn't set a date to expire the document. it specifies how old the document can be.
TTL index must be defined on a single field within a document.
You cant have two TTL indexes on the same collections.
That single filed must contain a date datatype & must not be the ID field
TTL index can also be used by mongo to optize in query plan
there are other options available to create index more application specific
1.) Covering Index: There is a subset of queries that do not require access document at all.
this happens when all the information in the query result is contained in the index itself. when this magic alignment happens query can be magical fast.
A query that only queries fields and returns fields contains in the index is covered by the index.
Example:
Consider the below messages collection, it has the index on the key "from.country, from.number & time". so this means index has the document value for these 3 fields already on the index.
if there is a query which only required "time" or "from.country" or "from.number" or all 3, in such cased the number of documents scanned will be zero. see below.
in below example lets exclude _ID
query example: db.messages.find({'from.country':'44'},{_id:0,time:1})
Explain example: db.messages.find({'from.country':'44'},{_id:0,time:1}).explain('executionStats')
Above stats show that mongo did not scanned any documents to get the results.
If your use case has any query that can be covered, this can provide huge performance benefit.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2.) Sparse index: Indexes are best when they are compact. the smaller they are the faster they can be loaded and processed. Regular mongo indexes hold an entry for every document, if a document happens to have a null value or null fields at all. the index will still include those documents under the null key. this can be a waste. if only a small portion of your documents have that fields at all. Remember mongo, has no predefined schema and it allows documents to contain arbitrarily different fields from document to document in the same collection.
Consider my case, i have a messages database, i have only a few documents tat are associated to some promotion that have promo fields.
Lets create a sparse index
db.messages.ensureIndex({promo:1},{sparse:true}
a sparse index is a optimization over a non-sparse index because the index size can be dramtically smaller if only only a small subset of doc contain that field.
Lets see the collection stat
As you see above the size of promo index is very small than the other indexes. Sparse index can be very helpful if index size is in MB's or GB's.
Note: there are some limitation of sparse index to decide carefully. Example sort operation can use index but if index is sparse, it means index doesn't have every document, so your result list will be shorter than it it should be. Sort should only order documents not drop ones that not have value.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3.) Unique Index: Ensure no other document contains the same field value. Enforces uniqueness within a single collection, not across shard. Exmaple. my telephone company customers, each have a unique phone number.
Each customer have a phone number, lets ensure it is unique and remain unique by adding a unique index.
Prob 1:If you try to add a customer who have the same mobile number as an existing customer. you will get an error as
"ermsg":"E11000 duplicate key error index"
Prob 2: Even if you try to crreate a customer without any phone number, first entry will be suceeded but anoterh customer without a phoen number will give same error.
"ermsg":"E11000 duplicate key error index"
Solution !!!!!!!!!!!!!!!! HMMM how about we try to create an index which is unique and sparse. WHY?? Since the sparse index will not have the entries for documents with no phone number. Lets try that..
Lets drop and recreate the index
We already have a customer will no phone, lets try to add other without phone mumber
WOW.It worked. We have 2 documents with no number. So thats what we wnated, if you have a number that better be unique. If you dont have a number, thats fine, and we can have multiple people with no number.
What if a customer have more that one phone number. this same idex will work fine. Instead of single value field for number, it can be in array and mongo will still ensure that no number is duplicate.
I will add a new customer and his number, but number in array. then try to push an existing numbers. lets see
An when i try to push a unique number, it it added successfully. See the result below.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
TTL Index: it means time to live index. Data is great and we would like to keep data forever. Well most data, and well not forever ever. Just as long as necessary. So how do we get rid of old data. traditionally we 'd have some timestamp on the some rows in the DB , then run batch job on timer outside of db to delete rows in table older than some threshold.
Mongo has a convenient way to do this without running external job. it is call TTL index.
TTL: is property on an index that defines how long document is allowed to live. after that the documnet will be subject to automatiuc removal by mongo.
Example, let create in index, to messages older than 90 days should not be kept.
expireAfterSeconds is expressed in relative terms. it doesn't set a date to expire the document. it specifies how old the document can be.
TTL index must be defined on a single field within a document.
You cant have two TTL indexes on the same collections.
That single filed must contain a date datatype & must not be the ID field
TTL index can also be used by mongo to optize in query plan
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home