samuel black

Aug 154 min read

Exploring the Boston Housing Dataset with TensorFlow in Python

In this blog, we'll explore how to use TensorFlow to create a simple regression model that predicts housing prices using the Boston Housing dataset. We'll walk through data preprocessing, model building, training, and evaluation.

Understanding Boston Housing Dataset

The Boston Housing dataset is one of the most famous datasets in the machine learning community. It contains information collected by the U.S Census Service concerning housing in the area of Boston, Massachusetts. The dataset is commonly used for regression analysis, where the objective is to predict the median value of owner-occupied homes based on various features such as crime rate, average number of rooms per dwelling, and more. The Boston Housing dataset contains 506 instances with 13 features each. The target variable is the median value of owner-occupied homes in $1000s. Below is a brief description of each feature:

CRIM: Per capita crime rate by town.
ZN: Proportion of residential land zoned for lots over 25,000 sq. ft.
INDUS: Proportion of non-retail business acres per town.
CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
NOX: Nitric oxides concentration (parts per 10 million).
RM: Average number of rooms per dwelling.
AGE: Proportion of owner-occupied units built before 1940.
DIS: Weighted distances to five Boston employment centers.
RAD: Index of accessibility to radial highways.
TAX: Full-value property tax rate per $10,000.
PTRATIO: Pupil-teacher ratio by town.
B: 1000(Bk - 0.63)^2 where Bk is the proportion of Black residents by town.
LSTAT: Percentage of lower status of the population.
MEDV: Median value of owner-occupied homes in $1000s.

Exploring the Boston Housing Dataset with TensorFlow in Python

Exploring the Boston Housing dataset with TensorFlow in Python offers a hands-on opportunity to understand and implement regression analysis using neural networks. This classic dataset, which includes various socio-economic and geographical features, is often used to predict the median value of homes in Boston, Massachusetts. By leveraging TensorFlow, we can efficiently preprocess the data, build a predictive model, and evaluate its performance. Through this exploration, one gains insights into the process of training a neural network, the importance of data standardization, and the practical application of machine learning techniques in real-world scenarios.

Loading and Preprocessing Boston Housing Dataset

First, let's load the dataset and perform some basic preprocessing. TensorFlow has the Boston Housing dataset available in its keras.datasets module, making it easy to load the data.

import tensorflow as tf

from tensorflow.keras.datasets import boston_housing

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Dropout

from sklearn.preprocessing import StandardScaler

import numpy as np

# Load the dataset

(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()

# Standardize the data

scaler = StandardScaler()

train_data = scaler.fit_transform(train_data)

test_data = scaler.transform(test_data)

# Print the shape of the data

print(f'Training data shape: {train_data.shape}')

print(f'Test data shape: {test_data.shape}')

Output for the above code:

Training data shape: (404, 13)
Test data shape: (102, 13)

Standardizing the data ensures that each feature has a mean of 0 and a standard deviation of 1, which helps in training the neural network more efficiently.

Building the Sequential Model

We'll build a simple feedforward neural network using TensorFlow's Keras API. The model will have a few dense layers, with ReLU activation functions, followed by a linear output layer.

# Build the model

model = Sequential([

Dense(64, activation='relu', input_shape=(train_data.shape[1],)),

Dropout(0.5),

Dense(64, activation='relu'),

Dense(1) # Output layer for regression

])

# Compile the model

model.compile(optimizer='adam', loss='mse', metrics=['mae'])

In the above code:

Dense(64, activation='relu'): Creates a dense (fully connected) layer with 64 neurons and ReLU activation function.
Dropout(0.5): Drops 50% of the neurons during training, which helps prevent overfitting.
Dense(1): The output layer has a single neuron since we're predicting a continuous value.

Training the Model

Next, we'll train the model using the training data. We'll also include validation to monitor the model's performance on unseen data.

# Train the model

history = model.fit(train_data, train_targets,

epochs=100,

validation_split=0.2,

batch_size=32,

verbose=1)

Output for the above code:

11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - loss: 22.1980 - mae: 3.5504 - val_loss: 13.1642 - val_mae: 2.6839
Epoch 97/100
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 19.4921 - mae: 3.3381 - val_loss: 13.5337 - val_mae: 2.7524
Epoch 98/100
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - loss: 25.4868 - mae: 3.5843 - val_loss: 13.4094 - val_mae: 2.7224
Epoch 99/100
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 20.7868 - mae: 3.4319 - val_loss: 13.5221 - val_mae: 2.7711
Epoch 100/100
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - loss: 19.2941 - mae: 3.2998 - val_loss: 14.3015 - val_mae: 2.8350

Here, we're training the model for 100 epochs, using 20% of the training data as validation data.

Evaluating the Model

After training, we can evaluate the model on the test set to see how well it generalizes to new data.

# Evaluate the model

model.evaluate(test_data, test_targets)

Output for the above code:

4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - loss: 20.8195 - mae: 3.3060 
[25.54439926147461, 3.5264275074005127]

The mean absolute error (MAE) gives us an indication of how far off our predictions are from the actual values on average.

Making Predictions

Finally, let's use the trained model to make predictions on the test data.

# Make predictions

predictions = model.predict(test_data)

# Print some predictions

for i in range(5):

print(f'Predicted value: {predictions[i][0]:.2f}, Actual value: {test_targets[i]:.2f}')

Output for the above code:

4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step
Predicted value: 8.02, Actual value: 7.20
Predicted value: 16.52, Actual value: 18.80
Predicted value: 20.18, Actual value: 19.00
Predicted value: 31.13, Actual value: 27.00
Predicted value: 24.20, Actual value: 22.20

Conclusion

In this blog, we've walked through the process of building a simple regression model using TensorFlow to predict housing prices from the Boston Housing dataset. We started by loading and preprocessing the data, then built and trained a neural network, and finally evaluated its performance.

This example demonstrates how easy it is to get started with TensorFlow for regression tasks. The Boston Housing dataset is just one of many datasets available for experimentation, and TensorFlow's powerful yet intuitive API makes it a great tool for both beginners and experts alike.

Whether you're interested in building more complex models or experimenting with different datasets, TensorFlow provides the flexibility and performance to help you achieve your goals in machine learning.

Learn through our Blogs, Get Expert Help & Innovate with Colabcodes

ColabCodes

Exploring the Boston Housing Dataset with TensorFlow in Python

Understanding Boston Housing Dataset

Exploring the Boston Housing Dataset with TensorFlow in Python

Loading and Preprocessing Boston Housing Dataset

Building the Sequential Model

Training the Model

Evaluating the Model

Making Predictions

Conclusion

Related Posts

Yorumlar

Get in touch for customized mentorship and freelance solutions tailored to your needs.

ColabCodes

Services

Experts