Overview
Records: 254,569
Val AUC: ~0.856
Model: LightGBM
Key Metrics
Total Predictions
0
test set records
Mean Repay Prob
0%
avg model confidence
High Confidence
0%
probability > 85%
High Risk Loans
0%
probability < 40%
Distribution
Prediction Histogram
254,569 loans across probability deciles
Loan Purpose Mix
Test set breakdown
5
types
Averages
Annual Income
$48,233
across 254,569 applicants
Credit Score
681 / 850
fair–good credit range
Loan Amount
$15,017
personal loan range
Prediction Explorer
Loan Predictions
Showing 10 of 40 results
254,569 total rows
Loan IDIncomeCreditLoan AmtRatePurposeEducationRepay %Risk
Page 1 of 4
Dataset Overview
Total Records
254,569
test.csv rows
Features
11
input columns
Categorical
6
encoded columns
Numeric
5
continuous columns
Feature Schema
All columns in test.csv
Missing Values
Imputed with −999 strategy
Sample Raw Data
First 8 rows of test.csv
Risk Classification
74.8%
Low Risk
190,437 loans
Probability > 75%
11.2%
Medium Risk
28,512 loans
Probability 40–75%
14.1%
High Risk
35,894 loans
Probability < 40%
Risk by Loan Purpose
Stacked low / medium / high per category
Probability Bands
Loan count per 10% probability bucket
Credit Score vs Risk Level
Higher score → lower default risk
Configuration
LightGBM Classifier
Gradient boosted trees · sklearn API
Performance Metrics
Validation set (20% stratified split)
Training Pipeline
End-to-End Workflow
Steps in test.py
Code
test.py — Annotated
Key pipeline code
# 1. Load data
train = pd.read_csv("train.csv")
test  = pd.read_csv("test.csv")

# 2. Impute missing values
X = X.fillna(-999)
X_test = X_test.fillna(-999)

# 3. Encode categoricals
for col in cat_cols:
    le = LabelEncoder()
    X[col] = le.fit_transform(X[col])
    X_test[col] = le.transform(X_test[col])

# 4. Train/val split
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# 5. Fit LightGBM
model = LGBMClassifier(n_estimators=1000, learning_rate=0.03)
model.fit(X_train, y_train)

# 6. Validate & refit on 100% train
score = roc_auc_score(y_val, model.predict_proba(X_val)[:,1])
model.fit(X, y)

# 7. Generate submission
test_preds = model.predict_proba(X_test)[:, 1]
submission.to_csv("submission.csv", index=False)
Feature Importance
Estimated Feature Importance
Relative importance (domain-estimated from LightGBM gradient boosting)
Statistics
Numeric Ranges
Min / Mean / Max across test set
Categorical Cardinality
Unique values per categorical column
Applicant Demographics
Gender Distribution
254,569 applicants
Education Level
Highest qualification
Employment Status
Work status at application
Marital Status
Civil status breakdown