Driver Churn Prediction
Github Link of the project:
Click hereProblem statementRecruiting and retaining drivers is seen by industry watchers as a tough battle for our taxi booking company. Churn among drivers is high and it’s very easy for drivers to stop working for the service on the fly or jump to other taxi booking apps depending on the rates.
As the companies get bigger, the high churn could become a bigger problem. To find new drivers, companies are casting a wide net, including people who don’t have cars for jobs. But this acquisition is really costly. Losing drivers frequently impacts the morale of the organization and acquiring new drivers is more expensive than retaining existing ones.
You are working as a data scientist with the Analytics Department of this taxi booking company, focused on driver team attrition. You are provided with the monthly information for a segment of drivers for 2019 and 2020 and tasked to predict whether a driver will be leaving the company or not based on their attributes like:
Demographics (city, age, gender etc.) Tenure information (joining date, Last Date) Historical data regarding the performance of the driver (Quarterly rating, Monthly business acquired, grade, Income)
Approaches used in this project- KNN imputation of missing values
- Feature engineering
- Univariate Analysis
- Bivariate Analysis
- Hypothesis Testing
- Spearman correlation
- Mean Encoding
- Normalization
- Train-Test split
- Trying different ensemble based models
- Using SMOTE to balance the data
- BayesSearchCV for Hyperparameter tuning
- AUC-ROC curve & Confusion Matrix is used to see performance
- Provided Recommendations to the company based on results
Models that were tried out- Random Forest
- Balanced Random Forest
- XGBoost Classifier
- LightGBM
- RUSBoost
Best performing modelBEST MODEL RUSBoost performed the best with the lowest number of Misclassified Points.
RESULTS:- AUC-ROC: 0.93
- F1 Score for Class 0 (Minority class): 0.84 & For class 1: 0.92
- 51 Misclassified out of 477 test data points.
Here is the youtube video on my channel explaining this