top of page

AUTOMATIC IMAGE DESCRIPTION

The goal of automatic image description is to generate a coherent and fluent sentence that accurately describes the content of the image.

MODEL2.png

Feature extraction from an andesite image

cnn-ez.gif

CNN EXPLAINER

Convolutional Neural Network

It is a type of network specifically designed to process data that has a grid structure, such as images.

It is responsible for extracting features from the input image, such as the locations, sizes, and colors of objects.

TRANSFORMER

It is a neural network architecture designed primarily for natural language processing tasks. It was first proposed in the 2017 article "Attention is All You Need" by Vaswani et al.

Instead of relying on recursions or convolutions, the Transformer uses attention mechanisms to process input and output sequences in a parallel and efficient manner.

The Transformer has become the basis for many cutting-edge models in the field of natural language processing, including BERT and GPT.

TRANSFORMER.jpg

This component takes image features and descriptions as input and learns to generate them.

ROBOT.jpg

EVALUATION OF THE MODEL

The generated descriptions are usually evaluated using metrics such as BLEU, METEOR, ROUGE and CIDEr.

In this model, the BLEU (Bilingual Evaluation Understudy) metric is applied to compare the similarity between the automatic description and the authentic description.

bottom of page