Overfitting vs Underfitting in Machine Learning Explained Clearly

Illustration showing overfitting vs underfitting in machine learning with graph comparison and model accuracy

Understanding the difference between overfitting and underfitting in machine learning is crucial to developing effective predictive models. These concepts affect a model’s ability to generalise to unseen data — a fundamental goal in machine learning.

In this post, we’ll break down both problems with practical examples, Python code snippets, expert opinion, and a human-centric explanation to help even non-experts grasp these vital concepts.

✅ What is Overfitting in Machine Learning?

Overfitting occurs when your model learns not only the underlying patterns but also the noise in the training data. This means the model performs very well on training data but poorly on unseen data.


🔍 Characteristics of Overfitting:

  • High accuracy on training set

  • Low accuracy on validation/test set

  • Complex model with too many parameters

Real-life analogy: Think of a student who memorises every line from a textbook without understanding the concepts. They might do well on a mock test but fail in real exams with twisted questions.

❌ What is Underfitting in Machine Learning?

Underfitting is the opposite of overfitting. It happens when a model is too simplistic and cannot capture the underlying pattern of the data — leading to poor performance on both training and validation data.

🔍 Characteristics of Underfitting:

  • Low accuracy on training and test data

  • Oversimplified model

  • Usually caused by insufficient training or a weak algorithm

Example: Using linear regression on non-linear data would result in underfitting.

🌟 Launch Your Product With Us!

Limited-time offer to showcase your business to our growing readers. Details in Description!

 

🎯 Why These Problems Matter

Both issues drastically reduce your model’s generalisation ability. In real-world applications like medical diagnosis, fraud detection, or recommendation systems, a poorly generalised model can lead to wrong predictions — potentially causing serious consequences.

🖼️ Visual Comparison: Overfitting vs Underfitting

Here’s a typical illustration of these concepts using a polynomial regression curve:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Create sample data
np.random.seed(0)
X = np.sort(np.random.rand(40, 1), axis=0)
y = np.sin(2 * np.pi * X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Models with increasing complexity
degrees = [1, 4, 15]
plt.figure(figsize=(12, 4))

for i, d in enumerate(degrees):
    poly = PolynomialFeatures(degree=d)
    X_poly = poly.fit_transform(X)
    model = LinearRegression().fit(X_poly, y)
    y_pred = model.predict(X_poly)

    plt.subplot(1, 3, i + 1)
    plt.scatter(X, y, s=10, label='Data')
    plt.plot(X, y_pred, label=f'Degree {d}', color='red')
    plt.title(['Underfitting', 'Good Fit', 'Overfitting'][i])
    plt.legend()

plt.tight_layout()
plt.show()

The left-most chart shows underfitting, the middle is a good fit, and the right shows overfitting.

💻 Step-by-Step Code: Detecting Overfitting vs Underfitting

Let’s apply this on a real dataset using Scikit-learn.

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score

# Load dataset
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Simple model (Underfitting)
lr = LinearRegression()
lr.fit(X_train, y_train)
print("Linear Regression - Train:", r2_score(y_train, lr.predict(X_train)))
print("Linear Regression - Test:", r2_score(y_test, lr.predict(X_test)))

# Complex model (Overfitting)
tree = DecisionTreeRegressor(max_depth=15)
tree.fit(X_train, y_train)
print("Decision Tree - Train:", r2_score(y_train, tree.predict(X_train)))
print("Decision Tree - Test:", r2_score(y_test, tree.predict(X_test)))

This will clearly show how a Decision Tree with deep depth overfits, while Linear Regression underfits.

🛠️ How to Detect Overfitting and Underfitting

For Overfitting:

  • Huge difference in training and validation accuracy

  • Training loss keeps dropping while validation loss increases

For Underfitting:

  • Low training accuracy

  • High bias in predictions

🔧 How to Prevent Overfitting

  • Use simpler models or reduce parameters

  • Apply regularisation (L1/L2)

  • Use cross-validation

  • Early stopping during training

  • Data augmentation or increase training size

How to Prevent Underfitting

  • Use more complex models

  • Train for longer epochs

  • Reduce regularisation strength

  • Use feature engineering for better inputs

🧠 Expert Opinion: On Overfitting vs Underfitting in Machine Learning

“Achieving the balance between overfitting and underfitting is the holy grail of model generalisation. Models should be expressive enough to capture trends, yet simple enough to avoid noise.”

— Dr. Natasha R., Data Science Educator, University of Cambridge

💬 Final Thoughts

The balance between overfitting and underfitting in machine learning defines your model’s success. The key is to find the sweet spot — where your model is just complex enough to learn patterns but not the noise.

Understanding this balance, using the right tools, and continuously validating your models is what separates novice practitioners from professional machine learning engineers.

Disclaimer:

While I am not a certified machine learning engineer or data scientist, I have thoroughly researched this topic using trusted academic sources, official documentation, expert insights, and widely accepted industry practices to compile this guide. This post is intended to support your learning journey by offering helpful explanations and practical examples. However, for high-stakes projects or professional deployment scenarios, consulting experienced ML professionals or domain experts is strongly recommended.
Your suggestions and views on machine learning are welcome—please share them below!


{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Overfitting vs Underfitting in Machine Learning Explained Clearly",
  "description": "Explore overfitting vs underfitting in machine learning with code examples expert tips and visual guides to build better models",
  "author": {
    "@type": "Person",
    "name": "Rajiv Dhiman"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Focus360Blog",
    "logo": {
      "@type": "ImageObject",
      "url": "https://www.focus360blog.online/images/logo.png"
    }
  },
  "datePublished": "2025-07-02",
  "dateModified": "2025-07-02",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://www.focus360blog.online/2025/07/overfitting-vs-underfitting-in-machine.html"
  }
}
🏠

Click here to Read more Like this Post

Post a Comment

Previous Post Next Post