Jakub Mazur

Backend Developer

Maastricht Netherlands

Back

Compressing Images with K-Means Clustering

May 24, 2026 (about 15 hours ago)

0

The internet is flooded with content every day. As of today, approximately 402 million terabytes of data are created each day[]. That number is almost impossible to picture so imagine, that every second, the world produces equivalent to over 70 years of 4K video.

That is why compression matters - instead of storing every detail exactly as it was captured, we look for ways to represent the same information using less space. Images are one of the media types that can be compressed, and in this post I'll explain one simple method to do so.

Compression by reducing the palette

There are many ways to compress images and they are categorized into two main types - lossless and lossy.

The first one preserves exact original quality but results in smaller file-size reductions. It detects repeating patterns and optimizes the data structures to code the images better. The second one, lossy, allows us to reduce the size significantly by discarding parts of the image while trying to maintain the representation of it.

We will focus on the lossy one that shrinks the color palette of an image, replacing multiple similar colors with a single representative one - a technique called color quantization.

Same image, fewer colors
As the palette gets smaller, more pixels have to share the same representative color.
Generating color previews...

Pixels are just numbers

Images are basically 2D arrays of pixels. Each pixel is one color, and each color is stored as three numbers: red, green, and blue.

Pixel inspector
Click the image to sample a downscaled pixel, then inspect how red, green, and blue blend into one color.
This is the 160 x 107 sampled image, enlarged with square pixels.

Every square on the grid is one sampled pixel. Typical images use RGB, with 8 bits per channel.

R = 8 bits, G = 8 bits, B = 8 bits

So one pixel is

3 × 8 = 24 bits = 3 bytes

That is often called 24-bit color, true color, or sometimes PNG-24. If the image has transparency, we would need to add another 8-bit channel, turning it into PNG-32 with RGBA (RGB-alpha) color space.

If we wanted to save space, we could lower the data needed to represent one pixel. If we used only 256 colors, then every pixel would take 8 bits and store an index into that palette. So instead of pixel = (124, 81, 203), we would store pixel = 17 that translates to 17 -> (124, 81, 203). That format is called PNG-8 or indexed color.

As you can see, now instead of using 3 bytes, we only use 1, thus cutting our raw pixel data to one-third. We could go even further and use a 16-color palette, reducing the original size by a factor of 6, or a 4-color one that would reduce it by a factor of 12. You can do the math!

Palette size comparison
Same pixel, fewer possible colors.
mode
colors
per pixel
raw saving
true color
16.7M colors
24 bits
1x
256-color palette
256 colors
8 bits
3x smaller
16-color palette
16 colors
4 bits
6x smaller
4-color palette
4 colors
2 bits
12x smaller
This compares raw pixel data. Real PNG files also use compression, so actual file sizes vary.

The K-means algorithm

Before we get our hands dirty, we need to get some knowledge first!

K-means is an unsupervised learning algorithm used for data clustering, which groups unlabeled data points into groups or clusters. It is one of the most popular clustering methods used in machine learning to group data points that are close to each other in attributes into some k-number of clusters.

Let's first ignore images for a second and imagine that we have a set of 2D data points (x, y). The demo below shows the standard K-means loop on those points.

K-means in 2D
Same idea as image colors, but with x/y points so the loop is easier to see.
K-means clustering step visualization
Step 1 / 7
Start with points

Let's imagine that we have a set of 2D data points. Each point has an x and y value.

current setup
k = 3
iteration = 0

From 2D points to colors

The 2D example used points like [x, y], but K-means does not really care what those numbers mean. A color can also be a point, just with three values instead of two: [red, green, blue].

From points to colors
K-means can cluster colors because colors are also points, just in RGB space.
Step 1 / 4
Sample colors

Instead of looking at every pixel at once, imagine we sampled a few colors from the image.

color -> [red, green, blue]

In code, I represent one RGB color as a tuple with three numbers:

type Rgb = [number, number, number];

const pixel: Rgb = [231, 86, 42];

How nearest color is calculated

When assigning a pixel to a centroid, we need to know which centroid is the closest one. For that I used squared Euclidean distance. I skipped the square root because it does not change which distance is the smallest.

function squaredRgbDistance(a: Rgb, b: Rgb) {
  return (a[0] - b[0]) ** 2 + (a[1] - b[1]) ** 2 + (a[2] - b[2]) ** 2;
}

Then for each pixel we check every centroid and remember the index of the closest one:

for (let pixelIndex = 0; pixelIndex < pixels.length; pixelIndex++) {
  const pixel = pixels[pixelIndex];
  let bestCentroidIndex = 0;
  let bestDistance = Infinity;

  for (let centroidIndex = 0; centroidIndex < centroids.length; centroidIndex++) {
    const distance = squaredRgbDistance(pixel, centroids[centroidIndex]);

    if (distance < bestDistance) {
      bestDistance = distance;
      bestCentroidIndex = centroidIndex;
    }
  }

  assignments[pixelIndex] = bestCentroidIndex;
}

Moving centroids

After every pixel has its closest centroid, we need to move each centroid to the average color of all pixels assigned to it. I use running totals for this, because every pixel already knows which centroid it belongs to.

const sums: Rgb[] = Array.from({ length: k }, () => [0, 0, 0]);
const counts: number[] = Array.from({ length: k }, () => 0);

for (let pixelIndex = 0; pixelIndex < pixels.length; pixelIndex++) {
  const pixel = pixels[pixelIndex];
  const centroidIndex = assignments[pixelIndex];

  sums[centroidIndex][0] += pixel[0];
  sums[centroidIndex][1] += pixel[1];
  sums[centroidIndex][2] += pixel[2];
  counts[centroidIndex]++;
}

Applying it to images

As mentioned before, every pixel color in an image is a point [red, green, blue].

So instead of clustering 2D points like [x, y], we cluster colors like [231, 86, 42]. The final centroids become the palette. K-means does not know whether that pixel belongs to an apple, a banana, or a shadow. It only sees values like [231, 86, 42].

K-means image compression
Each sampled pixel is replaced by the closest palette color found by K-means.
Original sampled image
Loading image...
Compressed image
Compressing sampled pixels...
Palette size
indexed pixel
4 bits
iterations
0
Palette found by K-means
Palette will appear after the image is sampled.

Limitations

K-means is simple but it is not perfect.

First, you need to pick k yourself. There is no automatic way to know how many colors an image needs. Too few and it looks posterized. Too many and you barely save any space.

Second, the algorithm starts with random centroids. Run it twice on the same image and you may get slightly different palettes. The demo above picks random pixels from the image itself, which helps, but it is still luck of the draw.

Third, K-means can get stuck in local optima. It greedily improves the clusters step by step, but it cannot backtrack. If the random start was bad, the final palette might miss an important color that was only represented by a few pixels.

Fourth, it treats RGB space as Euclidean geometry. In reality, human eyes do not perceive color differences equally in all directions. Two blues that look almost identical to us might be far apart in RGB distance, while a red-green pair that looks very different might score as closer. For serious compression, people convert to LAB color space first, which is designed to match human vision better.

Fifth, K-means has no idea about spatial structure. It looks at every pixel in isolation. That means two neighboring pixels that are almost the same color might end up assigned to different centroids, creating noise-like artifacts at edges. Algorithms that consider pixel neighborhoods, or add dithering after palette reduction, usually look smoother.

Sixth, some centroids can become empty clusters if no pixel is close enough to them. The implementation above handles this by restarting them from random pixels, but it wastes iterations.

For real-world image compression, formats like PNG-8 and GIF use more sophisticated palette selection (often based on median cut or octree quantization) rather than K-means, partly because those methods are faster and deterministic. K-means is still a great way to understand the core idea though.

Final implementation

Here is the complete algorithm used by the demo. The React component around it only handles loading the image into canvas and rendering the output. This is the actual K-means part.

type Rgb = [number, number, number];

interface KMeansImageInput {
  pixels: Rgb[];
  k: number;
  maxIterations: number;
}

interface KMeansImageResult {
  palette: Rgb[];
  indexedPixels: number[];
  iterations: number;
}

function squaredRgbDistance(a: Rgb, b: Rgb) {
  return (a[0] - b[0]) ** 2 + (a[1] - b[1]) ** 2 + (a[2] - b[2]) ** 2;
}

function randomPixel(pixels: Rgb[]) {
  return [...pixels[Math.floor(Math.random() * pixels.length)]] as Rgb;
}

function compressImageColors({ pixels, k, maxIterations }: KMeansImageInput): KMeansImageResult {
  let totalIters = 0;

  const centroids: Rgb[] = [];
  const assignments: number[] = new Array(pixels.length).fill(0);

  // 1. Initialize centroids from random pixels in the image.
  for (let i = 0; i < k; i++) {
    centroids[i] = randomPixel(pixels);
  }

  for (let iters = 0; iters < maxIterations; iters++) {
    totalIters++;
    const previousCentroids = centroids.map((centroid) => [...centroid] as Rgb);

    // 2. Assign each pixel to the nearest centroid.
    for (let pixelIndex = 0; pixelIndex < pixels.length; pixelIndex++) {
      const pixel = pixels[pixelIndex];
      let bestCentroidIndex = 0;
      let bestDistance = Infinity;

      for (let centroidIndex = 0; centroidIndex < centroids.length; centroidIndex++) {
        const centroid = centroids[centroidIndex];
        const distance = squaredRgbDistance(pixel, centroid);

        if (distance < bestDistance) {
          bestDistance = distance;
          bestCentroidIndex = centroidIndex;
        }
      }

      assignments[pixelIndex] = bestCentroidIndex;
    }

    // 3. Move each centroid to the average RGB value of assigned pixels.
    const sums: Rgb[] = Array.from({ length: k }, () => [0, 0, 0]);
    const counts: number[] = Array.from({ length: k }, () => 0);

    for (let pixelIndex = 0; pixelIndex < pixels.length; pixelIndex++) {
      const pixel = pixels[pixelIndex];
      const centroidIndex = assignments[pixelIndex];

      sums[centroidIndex][0] += pixel[0];
      sums[centroidIndex][1] += pixel[1];
      sums[centroidIndex][2] += pixel[2];
      counts[centroidIndex]++;
    }

    for (let centroidIndex = 0; centroidIndex < k; centroidIndex++) {
      if (counts[centroidIndex] === 0) {
        centroids[centroidIndex] = randomPixel(pixels);
        continue;
      }

      centroids[centroidIndex] = [
        sums[centroidIndex][0] / counts[centroidIndex],
        sums[centroidIndex][1] / counts[centroidIndex],
        sums[centroidIndex][2] / counts[centroidIndex],
      ];
    }

    // 4. Stop early if centroids barely moved.
    const movedDistance = centroids.reduce((sum, centroid, index) => {
      return sum + squaredRgbDistance(centroid, previousCentroids[index]);
    }, 0);

    if (movedDistance < 1) {
      break;
    }
  }

  return {
    palette: centroids.map(([r, g, b]) => [Math.round(r), Math.round(g), Math.round(b)]),
    indexedPixels: assignments,
    iterations: totalIters,
  };
}

Acknowledgements

Photo by Jonas Kakaroto on Unsplash.

The idea for this post was based on the K-means & Image Segmentation - Computerphile video.

If this resonated with you, hit that heart! It makes my day

Logo
© 2026 - Jakub Mazur