Evaluation Metrics

Overview

Search Tweak provides a suite of metrics to evaluate and optimize the performance of search models. These metrics are essential for understanding the relevance and effectiveness of search results, allowing for detailed analysis and improvement.

Precision at 10 (P@10)

Precision at 10 (P@10) measures the proportion of relevant documents in the top 10 search results.

Calculation

P@10

Pros

  • Simple to calculate and understand.
  • Direct measure of search quality for top results.

Cons

  • Does not consider the order of relevant documents.

Average Precision at 10 (AP@10)

Average Precision at 10 (AP@10) evaluates the average precision scores at the ranks where relevant documents are found up to the 10th result.

Calculation

AP@10

Pros

  • Considers the order of relevant documents.
  • Provides a balanced view of precision across different ranks.

Cons

  • Sensitive to the position of relevant documents within the top 10.

Reciprocal Rank at 10 (RR@10)

Reciprocal Rank at 10 (RR@10) measures the reciprocal of the rank of the first relevant document within the top 10 search results.

Calculation

RR@10

Pros

  • Highlights the importance of the first relevant document.
  • Simple interpretation.

Cons

  • Focuses only on the first relevant result.
  • Ignores the relevance of subsequent documents.

Cumulative Gain at 10 (CG@10)

Cumulative Gain at 10 (CG@10) sums the relevance scores of documents up to the 10th position.

Calculation

CG@10

Pros

  • Simple to calculate.
  • Reflects the total relevance of top results.

Cons

  • Does not account for the position of relevant documents.
  • Can be skewed by highly relevant documents lower in the ranking.

Discounted Cumulative Gain at 10 (DCG@10)

Discounted Cumulative Gain at 10 (DCG@10) is a variation of CG@10 that discounts the relevance scores based on their position.

Calculation

DCG@10

Pros

  • Accounts for the position of relevant documents.
  • Penalizes lower-ranked relevant documents.
  • More realistic measure of user satisfaction.

Cons

  • Sensitive to the position of relevance scores.
  • Requires relevance scores.

Normalized Discounted Cumulative Gain at 10 (nDCG@10)

Normalized Discounted Cumulative Gain at 10 (nDCG@10) normalizes DCG@10 by the ideal DCG (IDCG) at 10, making it easier to compare across different queries.

Calculation

nDCG@10

Pros

  • Normalizes the scores, allowing for comparison.
  • Provides a standard measure of ranking quality.

Cons

  • More complex to calculate.
  • Requires ideal ranking computation.

Mean Average Precision (MAP)

Mean Average Precision (MAP) is the mean of the average precision scores for a set of queries, providing a single measure of quality across multiple queries.

Calculation

MAP

Pros

  • Aggregates precision across multiple queries.
  • Comprehensive measure of search performance.

Cons

  • Requires multiple queries.
  • Sensitive to the presence of relevant documents.

Mean Reciprocal Rank (MRR)

Mean Reciprocal Rank (MRR) averages the reciprocal ranks of the first relevant document for a set of queries.

Calculation

MRR

Pros

  • Simple to interpret.
  • Emphasizes the importance of early relevant results.

Cons

  • Ignores subsequent relevant results.
  • Can be skewed by outliers.

Feel free to explore other sections of the documentation to get a better understanding of how to set up and use Search Tweak effectively.