Python and images

There are lots of libraries or modules to open and process images with Python. We can mention: PIL (Python Imaging Library or Pillow), OpenCV (Open Source Computer Vision Library), scikit-image…

Before using one of the most popular, OpenCV, we will go back to the basics of most principles used by these modules to process images.

To understand the fundamentals, we will use PGM and PPM images.

PGM and PPM files formats

PGM and PPM files formats are types of files used to store grayscale or RGB images. Their particularity is that they store data in a plain text format, making them easy to read and write.

A binary format also exists for more efficient storage.

PGM files can represent images with various levels of gray, from black to white.

PPM files support both grayscale and color images, with color images typically represented using RGB (Red, Green, Blue) color channels.

Note

Some file formats are much more commonly used (JPG or PNG), especially for data compression, making files lighter to store.

Structures of that kind of files are more complex and data need to be decompressed. But the basics are the same as PGM or PPM files.

Structure of a PGM file

The structure of a PGM file consists of a header section followed by the image data section. Here is a breakdown of the structure:

  • Header Section: The header section contains metadata about the image, including:

    • Magic Number: Indicates the file type. For PGM files, this is typically “P2” for ASCII-encoded files or “P5” for binary-encoded files.

    • Width and Height: Specifies the dimensions of the image in pixels.

    • Maximum Gray Value: Indicates the maximum intensity value for grayscale pixels (usually 255).

  • Image Data Section: The image data section contains the pixel values that represent the grayscale intensities of the image.

Here’s an example of the structure of an ASCII-encoded PGM file:

P2
# This is a comment
3 2
255
0 0 0   # Pixel values for the first row
255 255 255   # Pixel values for the second row

In this example, the first line is the magic number, followed by comments (which start with the “#” character) and metadata (width, height, and maximum gray value).

The image data section contains the pixel values that represent the grayscale intensities of the image.

Structure of a PPM file

The structure of a PPM file is the same as a PGM file. It consists of a header section followed by the image data section. Here is a breakdown of the structure:

  • Header Section: The header section contains metadata about the image, including:

    • Magic Number: Indicates the file type. For PGM files, this is typically “P3” for ASCII-encoded files or “P6” for binary-encoded files.

    • Width and Height: Specifies the dimensions of the image in pixels.

    • Maximum Gray Value: Indicates the maximum intensity value for grayscale pixels (usually 255).

  • Image Data Section: The image data section contains a set of three values that represent respectively the red, the gree, and the blue intensities of the image.

Here’s an example of the structure of an ASCII-encoded PPM file:

P3
# This is a comment
3 2
255
0 0 0 0 1 0 1 1 0   # Pixel values for the first row
255 0 0 0 255 0 0 0 255   # Pixel values for the second row

Read a PGM file - list

Note

The codes of this example is in the \progs\Python\00_open_a_PGM_file\ directory of the repository.

Examples of images are stored in \_data\ directory of the repository.

The main file of the Python script is 00_open_PGM_file_list.py

Function to read a PGM file

As described previously, a PGM file in plain text format can be opened like a classical ASCII text file. The path to the file must be given to the function.

 1def read_image_PGM(filename: str) -> list:
 2    try:
 3        with open(filename, 'r') as file:
 4            lines = file.readlines()
 5            magic_number = lines[0].strip()
 6            width, height = map(int, lines[1].split())
 7            max_gray_value = int(lines[2])
 8            print(f'PGM File\nMagic Number={magic_number}')
 9            print(f'W={width} / H={height} / Maximum Gray Value={max_gray_value}')
10            pixels = [[int(value) for value in line.split()] for line in lines[3:]]
11            return pixels
12    except FileNotFoundError:
13        print(f"Error: Unable to open file {filename}")

In the third line of this example, the file is opened with native Python function open, in reading mode (‘r’).

At the line 4, each line of the file is read and stored in the lines list. As mentioned in the Structure of a PGM file section, first lines contain metadata of the image.

Then, the last lines contain pixels information. They are stored in a new list (called pixels).

Opening an image

You can try this function with the next command, in a command shell:

>>> image = read_image_PGM("../../_data/robot.pgm")
PGM File
Magic Number=P2
W=600 / H=382 / Maximum Gray Value=255

The robot.pgm file contains an image of 600 x 382 pixels.

The type of the returned object is a Python list.

>>> print(type(image))
<class 'list'>

Accessing to a pixel value

The function we defined returns a list of integers. In this example, the length of the list image is 600 by 382 pixels.

To access to the value of the 58th pixel, you can use the following command:

>>> print(image[58])
[211]

But if you want to access to the 96th pixel of the 50th row (each row has 600 pixels), you need to use the following command:

>>> print(image[50 * 600 + 96])
[190]

Read a PGM file - array

Note

The codes of this example is in the \progs\Python\00_open_a_PGM_file\ directory of the repository.

Examples of images are stored in \_data\ directory of the repository.

The main file of the Python script is 00_open_PGM_file_array.py

Function to read a PGM file

As described previously, a PGM file in plain text format can be opened like a classical ASCII text file. The path to the file must be given to the function. This function is nearly the same as previously.

 1    def read_image_PGM(filename: str) -> list:
 2
 3            try:
 4                    with open(filename, 'r') as f:
 5                            # Read the magic number of the file
 6                            magic_number = f.readline().strip()
 7                            if magic_number != 'P2':
 8                                    raise ValueError("Invalid PGM file format")
 9
10                            # Read the width, height, and maximum gray value
11                            width, height = map(int, f.readline().split())
12                            max_gray_value = int(f.readline())
13                            print(f'PGM File\nMagic Number={magic_number}')
14                            print(f'W={width} / H={height} / Maximum Gray Value={max_gray_value}')
15
16                            # Read each pixel value
17                            pixels = []
18                            for line in f:
19                                    pixels.extend(line.split())
20
21                            # Convert the image data to a NumPy array
22                            image = np.array(pixels, dtype=int).reshape((height, width))
23                            return image
24            except FileNotFoundError:
25                    print(f"Error: Unable to open file {filename}")

Then, the last lines contain pixels information. They are stored in a new list (called image). In this case, the final object is an array from the Numpy package.

Opening an image

You can try this function with the next command, in a command shell:

>>> image = read_image_PGM("../../_data/robot.pgm")
PGM File
Magic Number=P2
W=600 / H=382 / Maximum Gray Value=255

The robot.pgm file contains an image of 600 x 382 pixels.

The type of the returned object is a Python numpy.ndarray.

>>> print(type(image))
<class 'numpy.ndarray'>

And you can access to the shape of the image by the following instruction :

>>> print(image.shape)
(382, 600)

Accessing to a pixel value

The function we defined returns a list of integers. In this example, the length of the list image is 600 by 382 pixels.

If you want to access to the 96th pixel of the 50th row (each row has 600 pixels), you need to use the following command:

>>> print(image[50, 96])
[190]

Create a class: ImagePGM

To facilitate the access to metadata of an image, it is recommended to use a data structure or a class that gathers the main characteristics of an image.

Note

If you are not familiar with object-oriented programming, you can practice with this tutorial: Soon…


Start by discovering PGM images (without compression - grayscale)

  • 01_imagePGM_classa class that describes a PGM image and its content, with:
    • readImagePGM method to read an image from its file name

    • overloading of the << operator

    • Python with list !!

  • 02_imagePGM_modularthe same as previously but with:
    • separated files for class declaration, definition and main program

    • writeImagePGM method to write an image