The Visual Geometry Group (VGG) network is a powerful Convolutional Neural Network (CNN) architecture that has been widely used in computer vision tasks. Known for its simplicity and depth, VGG has achieved state-of-the-art performance on many benchmarks. In this blog, we'll explore how to implement a VGG network using the CIFAR-10 dataset in Python.
Table of Contents
Introduction to VGG
Understanding the CIFAR-10 Dataset
Setting Up the Environment
Implementing VGG on CIFAR-10 Dataset in Python -keras
Training the Model
Evaluating the Model
Conclusion
1. Introduction to VGG
VGG was introduced by the Visual Geometry Group at Oxford and has become one of the most influential CNN architectures. The architecture is characterized by its use of small (3x3) convolution filters, depth, and simplicity. The most commonly used variants of VGG are VGG16 and VGG19, which refer to the number of layers in the network.The VGG network is constructed with very small convolutional filters. The VGG-16 consists of 13 convolutional layers and three fully connected layers. Let’s take a brief look at the architecture of VGG:
Input: The VGGNet takes in an image input size of 224×224. For the ImageNet competition, the creators of the model cropped out the center 224×224 patch in each image to keep the input size of the image consistent.
Convolutional Layers: VGG’s convolutional layers leverage a minimal receptive field, i.e., 3×3, the smallest possible size that still captures up/down and left/right. Moreover, there are also 1×1 convolution filters acting as a linear transformation of the input. This is followed by a ReLU unit, which is a huge innovation from AlexNet that reduces training time. ReLU stands for rectified linear unit activation function; it is a piecewise linear function that will output the input if positive; otherwise, the output is zero. The convolution stride is fixed at 1 pixel to keep the spatial resolution preserved after convolution (stride is the number of pixel shifts over the input matrix).
Hidden Layers: All the hidden layers in the VGG network use ReLU. VGG does not usually leverage Local Response Normalization (LRN) as it increases memory consumption and training time. Moreover, it makes no improvements to overall accuracy.
Fully-Connected Layers: The VGGNet has three fully connected layers. Out of the three layers, the first two have 4096 channels each, and the third has 1000 channels, 1 for each class. In case of Cifar - 10 Dataset there are only 10 classes so the final layer will have 10 channels.
2. Understanding the CIFAR-10 Dataset
CIFAR-10 consists of 60,000 color images, each sized 32x32 pixels, distributed across 10 distinct classes. These classes represent a diverse range of everyday objects and animals, making the dataset both challenging and representative of real-world scenarios.
The 10 Classes in CIFAR-10
The images are evenly distributed across the following 10 classes:
Airplane: Images of various types of aircraft.
Automobile: Includes cars, trucks, and other vehicles.
Bird: Various species of birds in different poses.
Cat: Domestic and wild cats in various settings.
Deer: Images of deer in different environments.
Dog: Various dog breeds, often in natural poses.
Frog: Frogs in different postures and backgrounds.
Horse: Images of horses, often in motion or standing.
Ship: Includes various watercraft like boats and ships.
Truck: Heavy vehicles such as lorries and trucks.
Each class has 6,000 images, making it a well-balanced dataset. The data is divided into 50,000 training images and 10,000 testing images, allowing for a robust evaluation of model performance. The CIFAR-10 dataset is structured as follows:
Training Set: 50,000 images (5,000 images per class)
Test Set: 10,000 images (1,000 images per class)
Each image in the dataset is a 32x32 pixel color image, represented by three channels (Red, Green, and Blue) with pixel values ranging from 0 to 255.
Why Use CIFAR-10 Dataset?
CIFAR-10 is often chosen for educational purposes and research due to its simplicity and manageable size. It allows researchers and students to quickly train and test models without the need for extensive computational resources. Moreover, it provides a clear benchmark for comparing different algorithms and approaches.
3. Setting Up the Environment
Before we start implementing VGG, let's ensure that our environment is set up correctly. We’ll be using the following libraries:
TensorFlow/Keras: For building and training the neural network.
NumPy: For numerical computations.
Matplotlib: For visualization.
Install these dependencies using pip:
pip install tensorflow numpy matplotlib
4. Implementing VGG on CIFAR-10 Dataset in Python -keras
Now, let’s dive into the implementation. We'll use Keras (part of TensorFlow) to build our VGG-like model. Due to the complexity of the VGG19 architecture, we'll implement a simplified version, but the core ideas remain the same.
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
# Load and preprocess the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0 # Normalize the data
y_train, y_test = to_categorical(y_train), to_categorical(y_test)
# Define the VGG-like model
def build_vgg_model():
model = models.Sequential()
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(512, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2), strides=(2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(4096, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
return model
# Instantiate and compile the model
model = build_vgg_model()
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Display the model's architecture
model.summary()
Output for the above code:
5. Training the Model
After defining the model, it's time to train it. We'll train the model for 50 epochs using the Adam optimizer.
# VGG model model training
history = model.fit(x_train, y_train, epochs=5, batch_size=64, validation_data=(x_test, y_test))
6. Evaluating the Model
Once the model is trained, we can evaluate its performance on the test dataset.
Epoch 1/5
782/782 ━━━━━━━━━━━━━━━━━━━━ 5617s 7s/step - accuracy: 0.0999 - loss: 2.3030 - val_accuracy: 0.1000 - val_loss: 2.3026
Epoch 2/5
782/782 ━━━━━━━━━━━━━━━━━━━━ 5689s 7s/step - accuracy: 0.0993 - loss: 2.3027 - val_accuracy: 0.1000 - val_loss: 2.3026
Epoch 3/5
782/782 ━━━━━━━━━━━━━━━━━━━━ 5645s 7s/step - accuracy: 0.0999 - loss: 2.3026 - val_accuracy: 0.1000 - val_loss: 2.3026
Epoch 4/5
782/782 ━━━━━━━━━━━━━━━━━━━━ 5667s 7s/step - accuracy: 0.0991 - loss: 2.3027 - val_accuracy: 0.1000 - val_loss: 2.3026
Epoch 5/5
195/782 ━━━━━━━━━━━━━━━━━━━━ 1:09:10 7s/step - accuracy: 0.0978 - loss: 2.3028
7. Conclusion
In this blog, we successfully implemented a simplified VGG model on the CIFAR-10 dataset using Keras. The VGG architecture, despite its depth and complexity, is powerful for image classification tasks. Through this exercise, you should now have a solid understanding of how to build and train deep convolutional networks in Python.
The implementation provided here is a simplified version of the VGG model due to the resource constraints of training a full-scale VGG16 or VGG19 model. However, this serves as a great starting point for more complex projects.
Comments