Source- Canva Team

Mongo Maths

The mathematics behind MongoDB Scoring

{
"name" : "Programming Laptop",
"description" :"8 GB/512 GB SSD/15 inch"
}
db.product.insertMany([{
"name" : "Programming Laptops - Dell Laptop",
"description" :"8 GB/512 GB SSD/intel Core i5 8th gen"
},{
"name" : "Programming in C",
"description" :"Programming in C | Third Edition | By Pearson"
},{
"name" : "programmer Laptop",
"description" :"Code|Half Sleeve T Shirt for Men"
},{
"name" : "Laptop",
"description" :"latest laptop"
},{
"name" : "Titan",
"description" :"premium watch"
}])
db.product.createIndex({"name":"text"})
db.product.find({$text: {$search: "Laptop"}}, {score: {$meta: "textScore"}}).sort({score:{$meta:"textScore"}})
{
"_id" : ObjectId("5f40f2aaa2a276d5a15fac3d"),
"name" : "Programming Laptops - Dell Laptop",
"description" : "8 GB/512 GB SSD/intel Core i5 8th gen",
"score" : 1.125
}
{
"_id" : ObjectId("5f40f2aaa2a276d5a15fac40"),
"name" : "Laptop",
"description" : "latest laptop",
"score" : 1.1
}
{
"_id" : ObjectId("5f40f2aaa2a276d5a15fac3f"),
"name" : "programmer Laptop",
"description" : "Code|Half Sleeve T Shirt for Men",
"score" : 0.75
}
db.product.createIndex({"name":"text","description":"text"}
db.product.createIndex({"name":"text","description":"text"}, {"weights": { name: 3, description:1 }})
db.product.createIndex({"$**":"text"})
db.product.find({$text: {$search: "\"programmer Laptop\""}}, {score: {$meta: "textScore"}}).sort({score:{$meta:"textScore"}}).pretty()
db.product.find({$text: {$search: "Laptop -Lenovo"}}, {score: {$meta: "textScore"}}).sort({score:{$meta:"textScore"}}).pretty()
Step 1: Let the search text = S
Step 2: Break S into tokens (If you are not doing Phrase search). Let's say T1,T2..Tn. Apply Stemming to each token
Step 3: For every search token, calculate score per index field of text index as follows:

score = (weight * data.freq * coeff * adjustment);

Where :
weight = user Defined Weight for any field. Default is 1 when no weight is specified
data.freq = how frequently the search token appeared in the text
coeff = ​(0.5 * data.count / numTokens) + 0.5
data.count
= Number of matching token
numTokens = Total number of tokens in the text
adjustment = 1 (By default).If the search token is exactly equal to the document field then adjustment = 1.1
Step 4: Final score of document is calculated by adding all tokens scores per text index field
Total Score = score(T1) + score(T2) + .....score(Tn)
Search String: “program” keyword
Sample Document: {"name" : "Programming with Program"}
Index Defined: $text index on "name" field
Score Calculation:Step 1: Tokenize search string.Apply Stemming.
Token 1: "program"
Step 2: For every search token obtained in Step 1, do steps 3-11:

Step 3: Take Sample Document and Remove Stop Words -> "Programming Program"
Step 4: Apply Stemming -> "program program"
Step 5: Calculate data.count per search token
data.count(program) = 2 (as program appears twice)
Step 6: Calculate total number of token in document
numTokens = 2
Step 7: Calculate coefficient per search token
coeff(program) =​ 0.5*(2⁄2) + 0.5 = 1.0
Step 8: Calculate adjustment per search token
adjustment(program) = 1 (As the search string is not equal to sample document)
Step 9: weight = 1 (As no special weight is assigned while creating text index)
Step 10: Calculate frequency of Data (data.freq) per search token:
a. Count the frequency of every token in sample document
b. [program => 2]
c. Data.freq(program) = 1/(2^0) + 1⁄(2^1) = 1.5
Step 11: Calculate score per search token:
score = (weight * data.freq * coeff * adjustment);
score(program) = (1 * 1.5 * 1.0 * 1.0) = 1.5
Step 12: Add individual score for every token of search string to get total score
Total score = score(program) = 1.5
Search String: “Programming books” keyword
Sample Document: {"name" : "Programming books: Programming with Java"}
Index Defined: $text index on "name" field
Score Calculation:Step 1: Tokenize search string. Apply Stemming.
Token 1: "program"
Token 2: "book"
Step 2: For every search token obtained in Step 1, do steps 3-11:

Step 3: Take Sample Document and Remove Stop Words -> "Programming books Programming Java"
Step 4: Apply Stemming -> "Program book Program Java"
Step 5: Calculate data.count per token
data.count(Programming) = 2 (as program appears twice)
data.count(books) = 1 (as book appears once)
Step 6: Calculate total number of token in document = numTokens = 4
Step 7: Calculate coefficient per token
coeff(Programming) =​ 0.5*(2⁄4) + 0.5 = 0.75
coeff(
books) =​ 0.5*(1⁄4) + 0.5 = 0.625
Step 8: Calculate adjustment per search token
adjustment(Programming) = 1
adjustment(books) = 1
Step 9: weight = 1 (As no special weight is assigned while creating text index)
Step 10: Frequency of Data (data.freq):
a. Count the frequency of every token in sample document
b. [program => 2] [book => 1] [Java => 1]
c. data.freq(Programming) = 1/(2^0) + 1⁄(2^1) = 1.5
d. data.freq(books) = 1/(2^0) = 1
Step 11: Calculate score for the token:
score = (weight * data.freq * coeff * adjustment);
score(
Programming) = (1 * 1.5 * 0.75 * 1.0) = 1.125
score(books) = (1 * 1 * 0.625 * 1.0) = 0.625
Step 12: Add individual score for every token of search string to get total scoreTotal score = score(Programming) + score(books) = 1.125 + 0.625 = 1.75
Search String: “program” keyword
Sample Document: {"name" : "Program: programming app", "description" : "program"}
Index Defined: $text index on "name" field and "description" field. Weight of name field = 3
Weight of descrption field = 1
Score Calculation:Step 1: Tokenize search string.Apply Stemming.
Token 1: "program"
Step 2: For every search token obtained in Step 1, do steps 3-11:

Step 3: Take Sample Document and Remove Stop Words ->
{"name" : "Program programming app", "description": "program"}
Step 4: Apply Stemming ->
{"name" : "program program app", "description": "program"}
Step 5: Calculate data.count per search token per index field
data.count(program)(name field) = 2
data.count(program)(description field) = 1
Step 6: Calculate total number of token in document per field
numTokens(name field) = 3
numTokens(description field) = 1
Step 7: Calculate coefficient per search token per field
coeff(program)(name field) =​ 0.5*(2⁄3) + 0.5 = (5/6)
coeff(program)(description field)=​ 0.5*(1⁄1) + 0.5 = 1
Step 8: Calculate adjustment per search token per field
adjustment(program)(name field) = 1
adjustment(program)(description field) = 1.1
Step 9: Input weight per field
weight(name field) = 3
weight(description field) = 1
Step 10: Calculate frequency of Data (data.freq) per search token per field:
a. Count the frequency of every token in sample document
b. Frequency in name field: [program => 2][app=>1]
c. Frequency in description field: [program => 1]
d. Data.freq(program)(name field) = 1/(2^0) + 1⁄(2^1) = 1.5
e. Data.freq(program)(description field) = 1/(2^0)=1
Step 11: Calculate score per search token per field:
score = (weight * data.freq * coeff * adjustment);
score(program)(name field) = (3 * 1.5 * (5/6) * 1.0) = 3.75
score(program)(description field) = (1 * 1 * 1 * 1.1) = 1.1
Step 12: Add individual score for every token of search string to get total scoreTotal score = score(program)(name field) + score(program)(description field) = 3.75 + 1.1 = 4.85

Software Developer | Technical Writer | Technology Enthusiast