AI

Regression evaluation metrics

1. Mean Absolute Error (MAE)

  • What it measures:
    The average absolute difference between predicted and actual values.

    • Ignores direction (over- or under-prediction).
    • Treats all errors equally.
  • Calculation: MAE=2+3+3+1+2+36=1462.33

    MAE=6∣2∣+∣3∣+∣3∣+∣1∣+∣2∣+∣3∣​=614​≈2.33

    • Interpretation: On average, predictions are off by 2.33 ice creams.
  • When to use:

    • When all errors (small or large) should be weighted equally.
    • Preferred when outliers are not a critical concern.

2. Mean Squared Error (MSE)

  • What it measures:
    The average of the squared differences between predicted and actual values.

    • Emphasizes larger errors (since squaring amplifies them).
  • Calculation: MSE=22+32+32+12+22+326=4+9+9+1+4+96=366=6

    MSE=622+32+32+12+22+32​=64+9+9+1+4+9​=636​=6

    • Interpretation: The "score" is 6, but this isn’t in ice cream units (due to squaring).
  • When to use:

    • When large errors are particularly undesirable (e.g., in financial models).
    • Used as a loss function in machine learning (optimization).

3. Root Mean Squared Error (RMSE)

  • What it measures:
    The square root of MSE, converting the metric back to the original units (ice creams).

  • Calculation: RMSE=MSE=62.45

  • Interpretation: Predictions are off by ~2.45 ice creams on average, with larger errors weighted more heavily.

  • When to use:

    • When you want MSE’s sensitivity to large errors but need interpretability in original units.

Key Differences Summary

MetricUnitsOutlier SensitivityInterpretation
MAEOriginal (e.g., ice creams)Low"Average error is X units"
MSESquared (e.g., ice creams²)High"Squared error average is X"
RMSEOriginal (e.g., ice creams)High"Error magnitude is ~X units"

Ice Cream Example Recap

  • MAE (2.33): "On average, predictions are off by ~2.3 ice creams."
  • MSE (6): "Large errors are penalized; score is 6 (no unit meaning)."
  • RMSE (2.45): "After squaring and square-rooting, errors average ~2.5 ice creams, with larger mistakes weighted more."

Which to choose?

  • Use MAE for straightforward, outlier-resistant interpretation.
  • Use RMSE if large errors are critical (e.g., safety-critical systems).

When to Use

  • MAE: When all errors should be treated equally
  • MSE/RMSE: When large errors are particularly undesirable
  • RMSE: Preferred when you need interpretability in original units

Coefficient of Determination (R²)

Definition

R² (R-Squared) measures the proportion of variance in the dependent variable (e.g., ice cream sales) that is predictable from the independent variable(s) in a regression model. It quantifies how well the model explains the observed data.


Key Formula

R2=1(yˆy)2(yˉy)2

  • ( y ): Actual values
  • ( \hat{y} ): Predicted values
  • ( \bar{y} ): Mean of actual values actual values

Interpretation

  • R² = 1: Perfect fit (100% variance explained).
  • R² = 0: Model explains none of the variance (no better than predicting the mean).
  • R² < 0: Model performs worse than a horizontal line (rare, indicates severe issues).

Ice Cream Example

  • R² = 0.95:
    • The model explains 95% of the variance in ice cream sales.
    • Only 5% of the variance is due to random/unexplained factors (e.g., festivals, weather anomalies).

Comparison with Other Metrics

MetricPurposeScaleFocus
MAEAverage absolute errorOriginalMagnitude of errors
MSE/RMSESquared/root-squared errorSquaredPenalizes large errors
Proportion of variance explained0 to 1Model explanatory power

Why It Matters

  1. Contextualizes Error:
    • Unlike MAE/MSE, R² compares errors to natural variance in the data.
  2. Model Selection:
    • Higher R² indicates a better-fit model (but beware of overfitting!).
  3. Intuitive Scale:
    • 0–1 range simplifies communication of model performance.

Limitations

  • Not for All Models: Misleading for non-linear relationships.
  • Overfitting Risk: Adding variables can artificially inflate R².
  • No Direction: Doesn’t indicate if predictions are biased.

Note: Use R² alongside other metrics (e.g., RMSE) for a complete evaluation.