XGBoost model to detect fraudulent auto insurance claims โ AUC-ROC 0.852 ยท Recall 88%
This project builds a machine learning pipeline to detect fraudulent auto insurance claims. Using a Kaggle dataset of 909 records ร 40 features, I trained and optimised an XGBoost classifier that achieves 88% recall on fraud cases โ meaning 88 out of every 100 fraudulent claims are correctly flagged.
| Model | AUC-ROC | Recall (Fraud) | F1 Score |
|---|---|---|---|
| Logistic Regression | 0.50 | 0.42 | |
| Random Forest | 0.06 | 0.11 | |
| XGBoost (optimised) | 0.88 | 0.74 |
Final results on the test set โ 182 observations.
Only 6 fraud cases missed out of 48 โ critical for insurance cost reduction.
Model simplified to 8 features with identical performance โ cleaner and more interpretable.