Bias & Fairness in Machine Learning
Machine learning models can perpetuate and amplify societal biases present in training data. A hiring model trained on historical decisions might discriminate by gender; a recidivism prediction model might have different error rates by race. Understanding, detecting, and mitigating these biases is both an ethical imperative and, increasingly, a legal requirement.
This lesson covers the types of bias, mathematical fairness definitions, detection tools, and mitigation strategies.
The Impossibility Theorem
Types of Bias
Data Bias
| Type | Description | Example |
|---|---|---|
| Selection bias | Training data is not representative | Medical data mostly from one demographic |
| Measurement bias | Features are measured differently across groups | Wealthier areas have more sensors |
| Label bias | Labels reflect historical discrimination | Historical hiring decisions that excluded women |
| Representation bias | Some groups are underrepresented | Few elderly users in tech product data |
Algorithmic Bias
| Type | Description | Example |
|---|---|---|
| Optimization bias | Model optimizes for majority group | Accuracy-maximizing model ignores minorities |
| Feature bias | Proxy features encode protected attributes | Zip code as a proxy for race |
| Feedback loops | Model predictions affect future data | Predictive policing increases arrests in targeted areas |
Fairness Metrics
Group Fairness Metrics
Demographic Parity (Statistical Parity): The selection rate should be equal across groups.
P(Y_hat=1 | A=0) = P(Y_hat=1 | A=1)
Equalized Odds: True positive rate and false positive rate should be equal across groups.
P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1) AND
P(Y_hat=1 | Y=0, A=0) = P(Y_hat=1 | Y=0, A=1)
Equal Opportunity: A relaxation of equalized odds — only requires equal true positive rates.
P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1)
Calibration: Among those predicted positive at a given probability, the actual positive rate should be equal.
P(Y=1 | Y_hat=p, A=0) = P(Y=1 | Y_hat=p, A=1)
The Four-Fifths Rule
A practical guideline from US employment law: the selection rate for any protected group should be at least 80% of the rate for the group with the highest selection rate. Also known as the "80% rule" or "disparate impact ratio."1import numpy as np
2from sklearn.ensemble import GradientBoostingClassifier
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import confusion_matrix
5
6np.random.seed(42)
7
8# --- Create a biased dataset ---
9# Simulate a hiring scenario with gender bias
10n = 3000
11gender = np.random.binomial(1, 0.5, n) # 0=female, 1=male
12education = np.random.normal(5, 1.5, n)
13experience = np.random.normal(5, 2, n)
14
15# Bias: males get a boost in hiring probability
16score = 0.3 * education + 0.4 * experience + 0.8 * gender
17noise = np.random.normal(0, 1, n)
18hired = (score + noise > 4).astype(int)
19
20X = np.column_stack([gender, education, experience])
21y = hired
22feature_names = ["gender", "education", "experience"]
23
24X_train, X_test, y_train, y_test = train_test_split(
25 X, y, test_size=0.3, random_state=42
26)
27
28# --- Train model ---
29gbc = GradientBoostingClassifier(
30 n_estimators=100, max_depth=3, random_state=42
31)
32gbc.fit(X_train, y_train)
33y_pred = gbc.predict(X_test)
34y_proba = gbc.predict_proba(X_test)[:, 1]
35
36print(f"Overall accuracy: {(y_pred == y_test).mean():.4f}")
37
38# --- Compute fairness metrics ---
39female_mask = X_test[:, 0] == 0
40male_mask = X_test[:, 0] == 1
41
42def fairness_metrics(y_true, y_pred, group_mask, group_name):
43 """Compute fairness metrics for a subgroup."""
44 cm = confusion_matrix(y_true[group_mask], y_pred[group_mask])
45 tn, fp, fn, tp = cm.ravel()
46 tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
47 fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
48 selection_rate = y_pred[group_mask].mean()
49 accuracy = (y_pred[group_mask] == y_true[group_mask]).mean()
50 return {
51 "group": group_name,
52 "n": group_mask.sum(),
53 "selection_rate": selection_rate,
54 "tpr": tpr,
55 "fpr": fpr,
56 "accuracy": accuracy,
57 }
58
59female_metrics = fairness_metrics(y_test, y_pred, female_mask, "Female")
60male_metrics = fairness_metrics(y_test, y_pred, male_mask, "Male")
61
62print("\n=== Fairness Metrics ===")
63print(f"{'Metric':<20} {'Female':>10} {'Male':>10} {'Ratio':>10}")
64print("-" * 55)
65for metric in ["selection_rate", "tpr", "fpr", "accuracy"]:
66 f_val = female_metrics[metric]
67 m_val = male_metrics[metric]
68 ratio = f_val / m_val if m_val > 0 else float("inf")
69 flag = " FAIL" if ratio < 0.8 else " PASS"
70 print(f"{metric:<20} {f_val:>10.4f} {m_val:>10.4f} {ratio:>9.2f}{flag}")
71
72# Demographic parity
73dp_diff = abs(female_metrics["selection_rate"] - male_metrics["selection_rate"])
74print(f"\nDemographic Parity Difference: {dp_diff:.4f}")
75
76# Equalized odds
77eo_tpr_diff = abs(female_metrics["tpr"] - male_metrics["tpr"])
78eo_fpr_diff = abs(female_metrics["fpr"] - male_metrics["fpr"])
79print(f"Equalized Odds (TPR gap): {eo_tpr_diff:.4f}")
80print(f"Equalized Odds (FPR gap): {eo_fpr_diff:.4f}")
81
82# Four-fifths rule
83ratio_4_5 = female_metrics["selection_rate"] / male_metrics["selection_rate"]
84print(f"\nFour-Fifths Rule: {ratio_4_5:.4f} "
85 f"({'PASS' if ratio_4_5 >= 0.8 else 'FAIL - disparate impact detected'})")Bias Mitigation Strategies
Pre-Processing (before training)
Modify the training data to remove bias:In-Processing (during training)
Modify the learning algorithm:Post-Processing (after training)
Modify the model's predictions:1import numpy as np
2from sklearn.ensemble import GradientBoostingClassifier
3from sklearn.model_selection import train_test_split
4
5np.random.seed(42)
6
7# Recreate biased dataset
8n = 3000
9gender = np.random.binomial(1, 0.5, n)
10education = np.random.normal(5, 1.5, n)
11experience = np.random.normal(5, 2, n)
12score = 0.3 * education + 0.4 * experience + 0.8 * gender
13noise = np.random.normal(0, 1, n)
14hired = (score + noise > 4).astype(int)
15X = np.column_stack([gender, education, experience])
16y = hired
17
18X_train, X_test, y_train, y_test = train_test_split(
19 X, y, test_size=0.3, random_state=42
20)
21
22# --- Mitigation 1: Remove protected attribute ---
23print("=== Mitigation 1: Remove Gender Feature ===")
24X_train_no_gender = X_train[:, 1:] # Drop gender column
25X_test_no_gender = X_test[:, 1:]
26
27gbc_no_gender = GradientBoostingClassifier(
28 n_estimators=100, max_depth=3, random_state=42
29)
30gbc_no_gender.fit(X_train_no_gender, y_train)
31y_pred_ng = gbc_no_gender.predict(X_test_no_gender)
32
33female = X_test[:, 0] == 0
34male = X_test[:, 0] == 1
35sr_f = y_pred_ng[female].mean()
36sr_m = y_pred_ng[male].mean()
37print(f"Selection rate: Female={sr_f:.4f}, Male={sr_m:.4f}, "
38 f"Ratio={sr_f/sr_m:.4f}")
39print(f"Accuracy: {(y_pred_ng == y_test).mean():.4f}")
40
41# --- Mitigation 2: Reweighting ---
42print("\n=== Mitigation 2: Sample Reweighting ===")
43# Compute weights to balance group-label combinations
44groups = X_train[:, 0]
45weights = np.ones(len(y_train))
46
47for g in [0, 1]:
48 for label in [0, 1]:
49 mask = (groups == g) & (y_train == label)
50 expected = len(y_train) / 4
51 actual = mask.sum()
52 weights[mask] = expected / actual if actual > 0 else 1.0
53
54gbc_reweight = GradientBoostingClassifier(
55 n_estimators=100, max_depth=3, random_state=42
56)
57gbc_reweight.fit(X_train, y_train, sample_weight=weights)
58y_pred_rw = gbc_reweight.predict(X_test)
59
60sr_f_rw = y_pred_rw[female].mean()
61sr_m_rw = y_pred_rw[male].mean()
62print(f"Selection rate: Female={sr_f_rw:.4f}, Male={sr_m_rw:.4f}, "
63 f"Ratio={sr_f_rw/sr_m_rw:.4f}")
64print(f"Accuracy: {(y_pred_rw == y_test).mean():.4f}")
65
66# --- Mitigation 3: Threshold adjustment (post-processing) ---
67print("\n=== Mitigation 3: Threshold Adjustment ===")
68gbc_full = GradientBoostingClassifier(
69 n_estimators=100, max_depth=3, random_state=42
70)
71gbc_full.fit(X_train, y_train)
72y_proba = gbc_full.predict_proba(X_test)[:, 1]
73
74# Find thresholds that equalize selection rates
75target_rate = y_proba.mean() # Use overall mean as target
76
77best_thresh = {"female": 0.5, "male": 0.5}
78for name, mask in [("female", female), ("male", male)]:
79 for t in np.arange(0.1, 0.9, 0.01):
80 sr = (y_proba[mask] >= t).mean()
81 if abs(sr - target_rate) < abs(
82 (y_proba[mask] >= best_thresh[name]).mean() - target_rate
83 ):
84 best_thresh[name] = t
85
86y_pred_thresh = np.zeros(len(y_test), dtype=int)
87y_pred_thresh[female] = (y_proba[female] >= best_thresh["female"]).astype(int)
88y_pred_thresh[male] = (y_proba[male] >= best_thresh["male"]).astype(int)
89
90sr_f_t = y_pred_thresh[female].mean()
91sr_m_t = y_pred_thresh[male].mean()
92print(f"Thresholds: Female={best_thresh['female']:.2f}, "
93 f"Male={best_thresh['male']:.2f}")
94print(f"Selection rate: Female={sr_f_t:.4f}, Male={sr_m_t:.4f}, "
95 f"Ratio={sr_f_t/(sr_m_t+1e-8):.4f}")
96print(f"Accuracy: {(y_pred_thresh == y_test).mean():.4f}")
97
98# --- Summary ---
99print("\n=== Comparison ===")
100print(f"{'Method':<25} {'DP Ratio':>10} {'Accuracy':>10}")
101print("-" * 45)
102methods = [
103 ("Baseline (with gender)", X_test[:, 0] == 0, X_test[:, 0] == 1,
104 gbc_full.predict(X_test)),
105 ("Remove gender", female, male, y_pred_ng),
106 ("Reweighting", female, male, y_pred_rw),
107 ("Threshold adjustment", female, male, y_pred_thresh),
108]
109for name, f_m, m_m, preds in methods:
110 sr_f = preds[f_m].mean()
111 sr_m = preds[m_m].mean()
112 ratio = sr_f / sr_m if sr_m > 0 else 0
113 acc = (preds == y_test).mean()
114 flag = " *" if ratio >= 0.8 else ""
115 print(f"{name:<25} {ratio:>10.4f} {acc:>10.4f}{flag}")Removing the Protected Attribute Is Often Not Enough
Fairness Toolkits
Fairlearn (Microsoft)
Python library for assessing and improving fairness. Provides:MetricFrame: Compute any metric disaggregated by groupThresholdOptimizer: Post-processing threshold adjustmentExponentiatedGradient: In-processing constrained optimizationGridSearch: Find the fairness-accuracy Pareto frontier