Evaluation Metrics

Each of these metrics gives insight into how well a model’s predictions align with actual values, though they each handle errors in different ways.

<aside> 📢

Formula

$\text{Actual Value} = y_i$

$\text{Predicted Value} = \hat{y_p}$

$\text{Absolute (x)}= |x|$

$\text{Mean} = \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i$

$\text{Sum} = \sum_{i=1}^{n} i$

</aside>

MAE vs MSE vs RMSE

Mean Absolute Error (MAE)
- Formula:
  
  $\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_p|$
- Description: MAE measures the average absolute difference between predicted (ŷ) and actual (y) values. It tells you, on average, how much the predictions differ from the actual values without regard to the direction of the errors.
- Characteristics:
  - Interpretable since it’s in the same unit as the data.
  - Less sensitive to outliers because it doesn’t square the error.
- When to Use: When you want a straightforward average error metric, and outliers are not critical.
- Best Score: 0
- Interpretation: Lower values are better, with 0 indicating perfect predictions.
- Good Score: Depends on the scale of the data. For example, if you’re predicting prices in the range of thousands, an MAE of a few hundred might be acceptable. Generally, a smaller MAE relative to the scale of the target variable is considered good.
Mean Squared Error (MSE)
- Formula:
  
  $\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_p)^2$
- Description: MSE measures the average of the squared differences between predicted and actual values. Squaring the errors penalizes larger deviations more than smaller ones.
- Characteristics:
  - Sensitive to large errors, making it effective at highlighting significant deviations.
  - Output is in squared units, which can be harder to interpret.
- When to Use: If you want to give larger errors more weight, such as in applications where even small deviations are critical.
- Best Score: 0
- Interpretation: Lower values indicate better performance, with 0 meaning the model’s predictions are exactly accurate.
- Good Score: Also depends on the scale of the data. Since MSE penalizes large errors more (due to squaring), an MSE that is relatively small compared to the square of the average target variable value is typically seen as good. But the interpretability is harder than MAE because the units are squared.
Root Mean Squared Error (RMSE)
- Formula:
  
  $\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_p)^2}$
- Description: RMSE is simply the square root of MSE, putting it back in the same unit as the target variable. It combines MSE’s sensitivity to large errors with better interpretability.
- Characteristics:
  - Easier to interpret than MSE as it’s in the same unit as the data.
  - Like MSE, it’s sensitive to large errors.
- When to Use: When you want a balance between penalizing large errors and interpretability in the original unit of the target variable.
- Best Score: 0
- Interpretation: Lower values indicate better performance, with 0 representing a perfect model.
- Good Score: Like MAE, a good RMSE score is one that is low relative to the target variable’s scale. Because RMSE is sensitive to large errors, you want it to be comparable to or less than the standard deviation of the target variable. This indicates that the model is generally making predictions within a reasonable range of the actual values.

Why Do You Need These Metrics?

Model Comparison: MAE, MSE, and RMSE allow you to compare the performance of different regression models, helping you select the best one.
Model Optimization: Minimizing these metrics during training helps in tuning model parameters for better predictions.
Insight into Error Types: Each metric emphasizes different aspects of error (e.g., MAE focuses on average errors, MSE and RMSE give more weight to large errors), providing insights depending on the importance of outliers or overall error magnitude.

What is a Good Score for These Metrics?

What constitutes a “good” score really depends on:

The scale and nature of the data: For instance, if you’re predicting house prices, errors in the range of hundreds might be fine; for predicting stock prices, even a small difference might be significant.
Domain-specific standards: In some fields, like finance, very low error margins are crucial, while in others, like weather forecasting, larger errors are tolerated.
Baseline or benchmark models: Comparing your model’s score with simpler models (like a mean predictor) or other established benchmarks in the field helps determine whether your model’s performance is “good.”

Rule of Thumb:

If MAE, MSE, or RMSE are significantly lower than the variability in your data (for example, the standard deviation of the target variable), this is often considered a good sign of model performance.
MAE < 10% of the average value of the target variable is often seen as quite good, though this is highly dependent on the context.

Why would we used MSE, if RMSE exist?

Optimization in Model Training: Many machine learning algorithms, especially in regression, are optimized by minimizing MSE (not RMSE) because MSE has a smooth gradient. Taking the square root to calculate RMSE adds a non-linear transformation, which can complicate the optimization process.
Statistical Properties: MSE has nice statistical properties and is more common in statistical theory and inferential statistics. It’s easier to work with mathematically, which is why it’s often used in the training phase.
Focus on Large Errors: If you specifically want to penalize large errors more (e.g., in situations where large errors are very costly), then MSE might be more informative. RMSE will still emphasize larger errors, but MSE makes them stand out even more because of the squaring effect.