The Battle of Neighborhoods - A study on London House Market

Capstone Project: The Battle of Neighborhood

Recommending the Most Valuable Venue for Homebuyers in London

This is a project for Coursera Applied Data Science Capstone.

Table of contents

Introduction

Background

According to Working Paper 72 published by the London Government, London’s house prices are relatively high compared to other areas in the UK and have been rising at a fast rate in recent years.
Meanwhile, housing markets in London have witnessed several ups and downs, with volatile house prices in London tending to amplify changes in national house prices.
London has experienced several episodes of real house price deflation.
From the patterns of previous cycles, it is hard to tell whether house prices are approaching a new peak or a downward adjustment.

Recently, as a result of the pandemic, increasing unemployment rate, wage cuts, business failures, and job uncertainty also brought massive impact to the housing market in London.
According to the Nationwide, house prices fell 1.7% in May from the previous month, the most significant monthly fall for 11 years, due to the headwinds of Covid-19.

Business Problem

As a matter of fact, it is of great necessity to integrate data science and machine learning technics with the study of London house market in order to assist clients making sensible and efficient choices.
How could we provide suggestions to our clients based on the study of neighborhoods for either settling or investing is our main business problem. We will recommend profitable venues according to amenities and infrastructures surrounded.

Data Section

London properties monthly updated price paid data is retrieved from the HM Land Registry.

Data link: http://landregistry.data.gov.uk/

After finding the list of neighbourhoods, we then connect to the Foursquare API to gather information about venues inside every neighbourhood. FourSquare API interface is a location data provider with information about venues. Data retrieved include venue name, latitude, longitude, category, street name and street location.

Properties Data

Let’s first download the properties’ data from the HM Land Registry website:

Import Libraries

import os
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library

import requests
import xml

import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

from sklearn.cluster import KMeans
from bs4 import BeautifulSoup

Retrieve Data from source

Here we use the data in csv file which containing transfer information in 2019.

df = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2019.csv")

df.shape
(972285, 16)
df.head(5)

{8CAC1318-AC2F-0253-E053-6B04A8C08E51} 165950 2019-05-28 00:00 BS22 7FP T N F 32 Unnamed: 8 KELSTON GARDENS Unnamed: 10 WESTON-SUPER-MARE NORTH SOMERSET NORTH SOMERSET.1 A A.1
0 {8CAC1318-AC31-0253-E053-6B04A8C08E51} 119500 2019-05-22 00:00 TA8 2EY F N L GROVE HOUSE, 58 FLAT 1 BERROW ROAD NaN BURNHAM-ON-SEA SEDGEMOOR SOMERSET A A
1 {8CAC1318-AC33-0253-E053-6B04A8C08E51} 215000 2019-06-21 00:00 TA22 9DH T N F 5 NaN WEIR HEAD NaN DULVERTON SOMERSET WEST AND TAUNTON SOMERSET A A
2 {8CAC1318-AC35-0253-E053-6B04A8C08E51} 242500 2019-05-01 00:00 BS20 7BP F N L 50 NaN LOWER BURLINGTON ROAD PORTISHEAD BRISTOL NORTH SOMERSET NORTH SOMERSET A A
3 {8CAC1318-AC36-0253-E053-6B04A8C08E51} 318000 2019-05-09 00:00 BA3 2RW T N F 4 NaN BUSHY COMBE MIDSOMER NORTON RADSTOCK BATH AND NORTH EAST SOMERSET BATH AND NORTH EAST SOMERSET A A
4 {8CAC1318-AC37-0253-E053-6B04A8C08E51} 215000 2019-05-17 00:00 BS24 7HZ S N F 9 NaN OSMOND ROAD NaN WESTON-SUPER-MARE NORTH SOMERSET NORTH SOMERSET A A

Data Preprocessing

We need to process the data frame for further usage. Here we aim to group the venues by street assigned in the data frame and look into the average price of each street.

Targetting housebuyers’ price expectation, the upper limit and lower limit of price can be set to narrow down the venues we are looking for.

Preprocessing the data to finally obtain a dataframe containing the following information:

  • Street Name
  • Average Price of Venues
  • Location Information

1. Rename the columns

# Assign meaningful column names
df.columns = ['TUID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']
df.head()

TUID Price Date_Transfer Postcode Prop_Type Old_New Duration PAON SAON Street Locality City District County PPD_Cat_Type Record_Status
0 {8CAC1318-AC31-0253-E053-6B04A8C08E51} 119500 2019-05-22 00:00 TA8 2EY F N L GROVE HOUSE, 58 FLAT 1 BERROW ROAD NaN BURNHAM-ON-SEA SEDGEMOOR SOMERSET A A
1 {8CAC1318-AC33-0253-E053-6B04A8C08E51} 215000 2019-06-21 00:00 TA22 9DH T N F 5 NaN WEIR HEAD NaN DULVERTON SOMERSET WEST AND TAUNTON SOMERSET A A
2 {8CAC1318-AC35-0253-E053-6B04A8C08E51} 242500 2019-05-01 00:00 BS20 7BP F N L 50 NaN LOWER BURLINGTON ROAD PORTISHEAD BRISTOL NORTH SOMERSET NORTH SOMERSET A A
3 {8CAC1318-AC36-0253-E053-6B04A8C08E51} 318000 2019-05-09 00:00 BA3 2RW T N F 4 NaN BUSHY COMBE MIDSOMER NORTON RADSTOCK BATH AND NORTH EAST SOMERSET BATH AND NORTH EAST SOMERSET A A
4 {8CAC1318-AC37-0253-E053-6B04A8C08E51} 215000 2019-05-17 00:00 BS24 7HZ S N F 9 NaN OSMOND ROAD NaN WESTON-SUPER-MARE NORTH SOMERSET NORTH SOMERSET A A

2. Select column City which is only in London

df_ld = df.query("City == 'LONDON'")
df_ld.head()

TUID Price Date_Transfer Postcode Prop_Type Old_New Duration PAON SAON Street Locality City District County PPD_Cat_Type Record_Status
86 {8CAC1318-AFF3-0253-E053-6B04A8C08E51} 1100000 2019-06-07 00:00 SW19 8BN T N F 30 NaN ASHEN GROVE NaN LONDON MERTON GREATER LONDON A A
88 {8CAC1318-AFF5-0253-E053-6B04A8C08E51} 905000 2019-06-20 00:00 SW19 8BH T N F 35 NaN DURNSFORD AVENUE NaN LONDON MERTON GREATER LONDON A A
95 {8CAC1318-AFFC-0253-E053-6B04A8C08E51} 1440000 2019-06-07 00:00 SW13 8DL T N F 29 NaN MERTHYR TERRACE NaN LONDON RICHMOND UPON THAMES GREATER LONDON A A
167 {8CAC1318-B38E-0253-E053-6B04A8C08E51} 402500 2019-06-14 00:00 SW16 4PB S N F 9 NaN POLLARDS WOOD ROAD NaN LONDON CROYDON GREATER LONDON A A
292 {8F1B26BD-4595-53DB-E053-6C04A8C03649} 340000 2019-07-15 00:00 N1 9BT F N L 75 FLAT 4 CALEDONIAN ROAD NaN LONDON ISLINGTON GREATER LONDON A A

3. Format transfer date

df_ld['Date_Transfer'] = df_ld['Date_Transfer'].apply(pd.to_datetime)

# Sort by Date
df_ld.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)
df_ld.head()

TUID Price Date_Transfer Postcode Prop_Type Old_New Duration PAON SAON Street Locality City District County PPD_Cat_Type Record_Status
730629 {A2479555-A33A-74C7-E053-6B04A8C0887D} 670000 2019-12-31 SW18 1UT F Y L 8 APARTMENT 1 OSIERS ROAD NaN LONDON WANDSWORTH GREATER LONDON A A
730613 {A2479555-A326-74C7-E053-6B04A8C0887D} 600000 2019-12-31 SW18 1UX F Y L 12 APARTMENT 46 OSIERS ROAD NaN LONDON WANDSWORTH GREATER LONDON A A
730633 {A2479555-A347-74C7-E053-6B04A8C0887D} 520000 2019-12-31 SW18 1UX F Y L 12 APARTMENT 7 OSIERS ROAD NaN LONDON WANDSWORTH GREATER LONDON A A
730636 {A2479555-A34B-74C7-E053-6B04A8C0887D} 600000 2019-12-31 SW18 1UT F Y L 8 APARTMENT 12 OSIERS ROAD NaN LONDON WANDSWORTH GREATER LONDON A A
819258 {9FF0D96A-38F5-11ED-E053-6C04A8C06383} 600000 2019-12-31 SW18 1UX F Y L FLAT 45, 12 NaN OSIERS ROAD NaN LONDON WANDSWORTH GREATER LONDON A A

4. Group by Street and calculate street average price

streets = df_ld['Street'].unique().tolist()
df_grp_price = df_ld.groupby(['Street'])['Price'].mean().reset_index()

df_grp_price.columns = ['Street', 'Avg_Price']

Depend on the Client’s Budget to set upper limit and lower limit that fits the inquiry

df_price_limit = df_grp_price.query("(Avg_Price >= 2200000) & (Avg_Price <= 2500000)")

df_price_limit.head()

Street Avg_Price
171 ALBERT EMBANKMENT 2.283829e+06
186 ALBION SQUARE 2.292500e+06
258 ALLEYN PARK 2.283095e+06
455 ARGYLE STREET 2.300000e+06
499 ARTESIAN ROAD 2.462500e+06

5. Use geolocator to obtain the latitude and longitutde for each street

for index, item in df_price_limit.iterrows():
    print(f"index: {index}")
    print(f"item: {item}")
    print(f"item.Street only: {item.Street}")
index: 171
item: Street       ALBERT EMBANKMENT
Avg_Price          2.28383e+06
Name: 171, dtype: object
item.Street only: ALBERT EMBANKMENT
index: 186
item: Street       ALBION SQUARE
Avg_Price       2.2925e+06
Name: 186, dtype: object
item.Street only: ALBION SQUARE
index: 258
item: Street       ALLEYN PARK
Avg_Price     2.2831e+06
Name: 258, dtype: object
item.Street only: ALLEYN PARK
index: 455
item: Street       ARGYLE STREET
Avg_Price          2.3e+06
Name: 455, dtype: object
item.Street only: ARGYLE STREET
index: 499
item: Street       ARTESIAN ROAD
Avg_Price       2.4625e+06
Name: 499, dtype: object
item.Street only: ARTESIAN ROAD
index: 536
item: Street       ASHCHURCH GROVE
Avg_Price          2.425e+06
Name: 536, dtype: object
item.Street only: ASHCHURCH GROVE
index: 680
item: Street       AYNHOE ROAD
Avg_Price      2.475e+06
Name: 680, dtype: object
item.Street only: AYNHOE ROAD
index: 989
item: Street       BEDALE STREET
Avg_Price       2.3446e+06
Name: 989, dtype: object
item.Street only: BEDALE STREET
index: 1021
item: Street       BEECHWOOD AVENUE
Avg_Price             2.5e+06
Name: 1021, dtype: object
item.Street only: BEECHWOOD AVENUE
index: 1179
item: Street       BETTERTON STREET
Avg_Price           2.425e+06
Name: 1179, dtype: object
item.Street only: BETTERTON STREET
index: 1180
item: Street       BETTRIDGE ROAD
Avg_Price           2.4e+06
Name: 1180, dtype: object
item.Street only: BETTRIDGE ROAD
index: 1220
item: Street       BILLING ROAD
Avg_Price         2.4e+06
Name: 1220, dtype: object
item.Street only: BILLING ROAD
index: 1326
item: Street       BLENHEIM CRESCENT
Avg_Price          2.37694e+06
Name: 1326, dtype: object
item.Street only: BLENHEIM CRESCENT
index: 1439
item: Street       BOURDON STREET
Avg_Price           2.2e+06
Name: 1439, dtype: object
item.Street only: BOURDON STREET
index: 1508
item: Street       BRADFIELD ROAD
Avg_Price         2.225e+06
Name: 1508, dtype: object
item.Street only: BRADFIELD ROAD
index: 1746
item: Street       BROMPTON PLACE
Avg_Price           2.4e+06
Name: 1746, dtype: object
item.Street only: BROMPTON PLACE
index: 2046
item: Street       CAITHNESS ROAD
Avg_Price       2.21833e+06
Name: 2046, dtype: object
item.Street only: CAITHNESS ROAD
index: 2105
item: Street       CAMBRIDGE TERRACE
Avg_Price             2.42e+06
Name: 2105, dtype: object
item.Street only: CAMBRIDGE TERRACE
index: 2113
item: Street       CAMDEN SQUARE
Avg_Price         2.43e+06
Name: 2113, dtype: object
item.Street only: CAMDEN SQUARE
index: 2125
item: Street       CAMPDEN HILL ROAD
Avg_Price           2.3054e+06
Name: 2125, dtype: object
item.Street only: CAMPDEN HILL ROAD
index: 2168
item: Street       CANONBURY PARK SOUTH
Avg_Price             2.36175e+06
Name: 2168, dtype: object
item.Street only: CANONBURY PARK SOUTH
index: 2218
item: Street       CARLISLE PLACE
Avg_Price         2.284e+06
Name: 2218, dtype: object
item.Street only: CARLISLE PLACE
index: 2223
item: Street       CARLTON GARDENS
Avg_Price            2.5e+06
Name: 2223, dtype: object
item.Street only: CARLTON GARDENS
index: 2227
item: Street       CARLTON ROAD
Avg_Price     2.26689e+06
Name: 2227, dtype: object
item.Street only: CARLTON ROAD
index: 2395
item: Street       CHALCOT SQUARE
Avg_Price       2.32375e+06
Name: 2395, dtype: object
item.Street only: CHALCOT SQUARE
index: 2540
item: Street       CHELSEA EMBANKMENT
Avg_Price           2.21375e+06
Name: 2540, dtype: object
item.Street only: CHELSEA EMBANKMENT
index: 2556
item: Street       CHENISTON GARDENS
Avg_Price             2.31e+06
Name: 2556, dtype: object
item.Street only: CHENISTON GARDENS
index: 2571
item: Street       CHESHAM MEWS
Avg_Price        2.26e+06
Name: 2571, dtype: object
item.Street only: CHESHAM MEWS
index: 2611
item: Street       CHEVENING ROAD
Avg_Price       2.47015e+06
Name: 2611, dtype: object
item.Street only: CHEVENING ROAD
index: 2966
item: Street       COLLINGHAM ROAD
Avg_Price          2.477e+06
Name: 2966, dtype: object
item.Street only: COLLINGHAM ROAD
index: 3206
item: Street       COULTER ROAD
Avg_Price         2.4e+06
Name: 3206, dtype: object
item.Street only: COULTER ROAD
index: 3218
item: Street       COURT LANE GARDENS
Avg_Price              2.44e+06
Name: 3218, dtype: object
item.Street only: COURT LANE GARDENS
index: 3230
item: Street       COURTHOPE ROAD
Avg_Price          2.24e+06
Name: 3230, dtype: object
item.Street only: COURTHOPE ROAD
index: 3237
item: Street       COURTNELL STREET
Avg_Price         2.40955e+06
Name: 3237, dtype: object
item.Street only: COURTNELL STREET
index: 3308
item: Street       CRAWFORD MEWS
Avg_Price          2.5e+06
Name: 3308, dtype: object
item.Street only: CRAWFORD MEWS
index: 3319
item: Street       CREDITON HILL
Avg_Price      2.26533e+06
Name: 3319, dtype: object
item.Street only: CREDITON HILL
index: 3597
item: Street       DARTMOUTH PARK AVENUE
Avg_Price              2.38833e+06
Name: 3597, dtype: object
item.Street only: DARTMOUTH PARK AVENUE
index: 3646
item: Street       DE VERE GARDENS
Avg_Price        2.21906e+06
Name: 3646, dtype: object
item.Street only: DE VERE GARDENS
index: 3672
item: Street       DEEPDALE
Avg_Price    2.48e+06
Name: 3672, dtype: object
item.Street only: DEEPDALE
index: 3675
item: Street       DEER PARK ROAD
Avg_Price          2.45e+06
Name: 3675, dtype: object
item.Street only: DEER PARK ROAD
index: 3761
item: Street       DEVEREUX LANE
Avg_Price          2.3e+06
Name: 3761, dtype: object
item.Street only: DEVEREUX LANE
index: 3771
item: Street       DEVONSHIRE MEWS WEST
Avg_Price                 2.4e+06
Name: 3771, dtype: object
item.Street only: DEVONSHIRE MEWS WEST
index: 3891
item: Street       DOVER STREET
Avg_Price         2.5e+06
Name: 3891, dtype: object
item.Street only: DOVER STREET
index: 3912
item: Street       DOWNSIDE CRESCENT
Avg_Price          2.33818e+06
Name: 3912, dtype: object
item.Street only: DOWNSIDE CRESCENT
index: 3974
item: Street       DUDLEY ROAD
Avg_Price    2.31558e+06
Name: 3974, dtype: object
item.Street only: DUDLEY ROAD
index: 3982
item: Street       DUKES LANE
Avg_Price      2.35e+06
Name: 3982, dtype: object
item.Street only: DUKES LANE
index: 4146
item: Street       EATON TERRACE MEWS
Avg_Price               2.4e+06
Name: 4146, dtype: object
item.Street only: EATON TERRACE MEWS
index: 4162
item: Street       ECCLESTON MEWS
Avg_Price         2.225e+06
Name: 4162, dtype: object
item.Street only: ECCLESTON MEWS
index: 4165
item: Street       ECCLESTON SQUARE
Avg_Price         2.49091e+06
Name: 4165, dtype: object
item.Street only: ECCLESTON SQUARE
index: 4284
item: Street       ELLERBY STREET
Avg_Price          2.45e+06
Name: 4284, dtype: object
item.Street only: ELLERBY STREET
index: 4393
item: Street       ELVASTON PLACE
Avg_Price         2.352e+06
Name: 4393, dtype: object
item.Street only: ELVASTON PLACE
index: 4454
item: Street       ENNISMORE GARDENS MEWS
Avg_Price                   2.2e+06
Name: 4454, dtype: object
item.Street only: ENNISMORE GARDENS MEWS
index: 4502
item: Street       ESSEX STREET
Avg_Price     2.46204e+06
Name: 4502, dtype: object
item.Street only: ESSEX STREET
index: 4553
item: Street       EVERSLEY CRESCENT
Avg_Price              2.2e+06
Name: 4553, dtype: object
item.Street only: EVERSLEY CRESCENT
index: 4582
item: Street       EYNELLA ROAD
Avg_Price       2.296e+06
Name: 4582, dtype: object
item.Street only: EYNELLA ROAD
index: 4713
item: Street       FENCHURCH STREET
Avg_Price         2.46012e+06
Name: 4713, dtype: object
item.Street only: FENCHURCH STREET
index: 4851
item: Street       FLORAL STREET
Avg_Price      2.44275e+06
Name: 4851, dtype: object
item.Street only: FLORAL STREET
index: 4949
item: Street       FOUNTAYNE ROAD
Avg_Price        2.4975e+06
Name: 4949, dtype: object
item.Street only: FOUNTAYNE ROAD
index: 4950
item: Street       FOURNIER STREET
Avg_Price           2.45e+06
Name: 4950, dtype: object
item.Street only: FOURNIER STREET
index: 5039
item: Street       FRISTON STREET
Avg_Price          2.45e+06
Name: 5039, dtype: object
item.Street only: FRISTON STREET
index: 5218
item: Street       GERTRUDE STREET
Avg_Price            2.3e+06
Name: 5218, dtype: object
item.Street only: GERTRUDE STREET
index: 5361
item: Street       GLOUCESTER PLACE MEWS
Avg_Price                  2.5e+06
Name: 5361, dtype: object
item.Street only: GLOUCESTER PLACE MEWS
index: 5424
item: Street       GORDON COTTAGES
Avg_Price            2.2e+06
Name: 5424, dtype: object
item.Street only: GORDON COTTAGES
index: 5434
item: Street       GORST ROAD
Avg_Price       2.3e+06
Name: 5434, dtype: object
item.Street only: GORST ROAD
index: 5469
item: Street       GRAHAM TERRACE
Avg_Price           2.2e+06
Name: 5469, dtype: object
item.Street only: GRAHAM TERRACE
index: 5553
item: Street       GREAT TITCHFIELD STREET
Avg_Price                  2.395e+06
Name: 5553, dtype: object
item.Street only: GREAT TITCHFIELD STREET
index: 5793
item: Street       HALLAM STREET
Avg_Price        2.205e+06
Name: 5793, dtype: object
item.Street only: HALLAM STREET
index: 5801
item: Street       HALSEY STREET
Avg_Price        2.387e+06
Name: 5801, dtype: object
item.Street only: HALSEY STREET
index: 5931
item: Street       HARLEY GARDENS
Avg_Price          2.28e+06
Name: 5931, dtype: object
item.Street only: HARLEY GARDENS
index: 5944
item: Street       HARMSWORTH WAY
Avg_Price       2.24188e+06
Name: 5944, dtype: object
item.Street only: HARMSWORTH WAY
index: 5985
item: Street       HARWOOD ROAD
Avg_Price      2.2245e+06
Name: 5985, dtype: object
item.Street only: HARWOOD ROAD
index: 6004
item: Street       HATCHAM ROAD
Avg_Price     2.44008e+06
Name: 6004, dtype: object
item.Street only: HATCHAM ROAD
index: 6100
item: Street       HEATH DRIVE
Avg_Price      2.388e+06
Name: 6100, dtype: object
item.Street only: HEATH DRIVE
index: 6101
item: Street       HEATH HURST ROAD
Avg_Price          2.2475e+06
Name: 6101, dtype: object
item.Street only: HEATH HURST ROAD
index: 6160
item: Street       HENDERSON ROAD
Avg_Price          2.39e+06
Name: 6160, dtype: object
item.Street only: HENDERSON ROAD
index: 6298
item: Street       HIGHBURY NEW PARK
Avg_Price           2.4975e+06
Name: 6298, dtype: object
item.Street only: HIGHBURY NEW PARK
index: 6346
item: Street       HILL STREET
Avg_Price        2.5e+06
Name: 6346, dtype: object
item.Street only: HILL STREET
index: 6382
item: Street        HILLWAY
Avg_Price    2.25e+06
Name: 6382, dtype: object
item.Street only: HILLWAY
index: 6421
item: Street       HOLBORN CLOSE
Avg_Price          2.2e+06
Name: 6421, dtype: object
item.Street only: HOLBORN CLOSE
index: 6444
item: Street       HOLLAND PARK ROAD
Avg_Price             2.35e+06
Name: 6444, dtype: object
item.Street only: HOLLAND PARK ROAD
index: 6687
item: Street         HYDE VALE
Avg_Price    2.31517e+06
Name: 6687, dtype: object
item.Street only: HYDE VALE
index: 6767
item: Street       IRVING ROAD
Avg_Price        2.5e+06
Name: 6767, dtype: object
item.Street only: IRVING ROAD
index: 6899
item: Street       JUNCTION PLACE
Avg_Price           2.2e+06
Name: 6899, dtype: object
item.Street only: JUNCTION PLACE
index: 7055
item: Street       KILDARE GARDENS
Avg_Price            2.3e+06
Name: 7055, dtype: object
item.Street only: KILDARE GARDENS
index: 7470
item: Street       LEAMOUTH ROAD
Avg_Price      2.38315e+06
Name: 7470, dtype: object
item.Street only: LEAMOUTH ROAD
index: 7516
item: Street       LEINSTER SQUARE
Avg_Price         2.2723e+06
Name: 7516, dtype: object
item.Street only: LEINSTER SQUARE
index: 7620
item: Street       LILYVILLE ROAD
Avg_Price       2.29167e+06
Name: 7620, dtype: object
item.Street only: LILYVILLE ROAD
index: 7849
item: Street       LOWER BELGRAVE STREET
Avg_Price              2.40375e+06
Name: 7849, dtype: object
item.Street only: LOWER BELGRAVE STREET
index: 7953
item: Street       LYONS PLACE
Avg_Price     2.3667e+06
Name: 7953, dtype: object
item.Street only: LYONS PLACE
index: 8154
item: Street       MARESFIELD GARDENS
Avg_Price           2.24208e+06
Name: 8154, dtype: object
item.Street only: MARESFIELD GARDENS
index: 8186
item: Street       MARKHAM STREET
Avg_Price           2.3e+06
Name: 8186, dtype: object
item.Street only: MARKHAM STREET
index: 8511
item: Street         MILKY WAY
Avg_Price    2.25718e+06
Name: 8511, dtype: object
item.Street only: MILKY WAY
index: 8671
item: Street       MORELLA ROAD
Avg_Price         2.5e+06
Name: 8671, dtype: object
item.Street only: MORELLA ROAD
index: 8758
item: Street       MOUNTVIEW CLOSE
Avg_Price        2.23333e+06
Name: 8758, dtype: object
item.Street only: MOUNTVIEW CLOSE
index: 9003
item: Street       NIGHTINGALE SQUARE
Avg_Price             2.475e+06
Name: 9003, dtype: object
item.Street only: NIGHTINGALE SQUARE
index: 9047
item: Street       NORRICE LEA
Avg_Price      2.355e+06
Name: 9047, dtype: object
item.Street only: NORRICE LEA
index: 9152
item: Street       OAK HILL PARK MEWS
Avg_Price               2.4e+06
Name: 9152, dtype: object
item.Street only: OAK HILL PARK MEWS
index: 9252
item: Street       OLD KENT ROAD
Avg_Price      2.24288e+06
Name: 9252, dtype: object
item.Street only: OLD KENT ROAD
index: 9301
item: Street       ONSLOW CRESCENT
Avg_Price            2.4e+06
Name: 9301, dtype: object
item.Street only: ONSLOW CRESCENT
index: 9337
item: Street       ORMONDE GATE
Avg_Price        2.35e+06
Name: 9337, dtype: object
item.Street only: ORMONDE GATE
index: 9418
item: Street       PADDINGTON STREET
Avg_Price          2.39321e+06
Name: 9418, dtype: object
item.Street only: PADDINGTON STREET
index: 9433
item: Street       PALACE COURT
Avg_Price     2.27367e+06
Name: 9433, dtype: object
item.Street only: PALACE COURT
index: 9512
item: Street       PARKE ROAD
Avg_Price      2.25e+06
Name: 9512, dtype: object
item.Street only: PARKE ROAD
index: 9546
item: Street        PARR'S WAY
Avg_Price    2.33501e+06
Name: 9546, dtype: object
item.Street only: PARR'S WAY
index: 9550
item: Street       PARSONS GATE MEWS
Avg_Price              2.2e+06
Name: 9550, dtype: object
item.Street only: PARSONS GATE MEWS
index: 9584
item: Street       PAVILION ROAD
Avg_Price          2.3e+06
Name: 9584, dtype: object
item.Street only: PAVILION ROAD
index: 9653
item: Street       PEMBRIDGE CRESCENT
Avg_Price           2.27679e+06
Name: 9653, dtype: object
item.Street only: PEMBRIDGE CRESCENT
index: 9654
item: Street       PEMBRIDGE MEWS
Avg_Price        2.2851e+06
Name: 9654, dtype: object
item.Street only: PEMBRIDGE MEWS
index: 9693
item: Street       PENNINGTON STREET
Avg_Price          2.25195e+06
Name: 9693, dtype: object
item.Street only: PENNINGTON STREET
index: 9791
item: Street       PICTON STREET
Avg_Price         2.35e+06
Name: 9791, dtype: object
item.Street only: PICTON STREET
index: 9807
item: Street       PINE GROVE
Avg_Price       2.4e+06
Name: 9807, dtype: object
item.Street only: PINE GROVE
index: 9929
item: Street       PORTLAND PLACE
Avg_Price       2.27586e+06
Name: 9929, dtype: object
item.Street only: PORTLAND PLACE
index: 10028
item: Street       PRINCES SQUARE
Avg_Price       2.23089e+06
Name: 10028, dtype: object
item.Street only: PRINCES SQUARE
index: 10116
item: Street       QUEEN ANNE STREET
Avg_Price          2.44167e+06
Name: 10116, dtype: object
item.Street only: QUEEN ANNE STREET
index: 10119
item: Street       QUEEN ANNES GROVE
Avg_Price             2.36e+06
Name: 10119, dtype: object
item.Street only: QUEEN ANNES GROVE
index: 10350
item: Street       REDCLIFFE MEWS
Avg_Price       2.41589e+06
Name: 10350, dtype: object
item.Street only: REDCLIFFE MEWS
index: 10628
item: Street       ROSE SQUARE
Avg_Price    2.29167e+06
Name: 10628, dtype: object
item.Street only: ROSE SQUARE
index: 10829
item: Street       RYECROFT STREET
Avg_Price          2.225e+06
Name: 10829, dtype: object
item.Street only: RYECROFT STREET
index: 10875
item: Street       SALUSBURY ROAD
Avg_Price         2.407e+06
Name: 10875, dtype: object
item.Street only: SALUSBURY ROAD
index: 11317
item: Street       SOUTH END ROW
Avg_Price         2.35e+06
Name: 11317, dtype: object
item.Street only: SOUTH END ROW
index: 11328
item: Street       SOUTH PARADE
Avg_Price       2.425e+06
Name: 11328, dtype: object
item.Street only: SOUTH PARADE
index: 11430
item: Street       SPRIMONT PLACE
Avg_Price          2.25e+06
Name: 11430, dtype: object
item.Street only: SPRIMONT PLACE
index: 11488
item: Street       ST AUBYNS AVENUE
Avg_Price            2.48e+06
Name: 11488, dtype: object
item.Street only: ST AUBYNS AVENUE
index: 11554
item: Street       ST JOHN STREET
Avg_Price       2.28894e+06
Name: 11554, dtype: object
item.Street only: ST JOHN STREET
index: 11677
item: Street       ST STEPHENS GARDENS
Avg_Price            2.20317e+06
Name: 11677, dtype: object
item.Street only: ST STEPHENS GARDENS
index: 11843
item: Street       STRADELLA ROAD
Avg_Price       2.34425e+06
Name: 11843, dtype: object
item.Street only: STRADELLA ROAD
index: 12065
item: Street       SYDNEY STREET
Avg_Price      2.34882e+06
Name: 12065, dtype: object
item.Street only: SYDNEY STREET
index: 12138
item: Street       TEDWORTH SQUARE
Avg_Price            2.2e+06
Name: 12138, dtype: object
item.Street only: TEDWORTH SQUARE
index: 12274
item: Street       THE LITTLE BOLTONS
Avg_Price              2.25e+06
Name: 12274, dtype: object
item.Street only: THE LITTLE BOLTONS
index: 12277
item: Street       THE MARLOWES
Avg_Price         2.5e+06
Name: 12277, dtype: object
item.Street only: THE MARLOWES
index: 12501
item: Street       TOTTERIDGE COMMON
Avg_Price              2.2e+06
Name: 12501, dtype: object
item.Street only: TOTTERIDGE COMMON
index: 12740
item: Street       UPPER MONTAGU STREET
Avg_Price             2.23143e+06
Name: 12740, dtype: object
item.Street only: UPPER MONTAGU STREET
index: 12922
item: Street       VIVIAN WAY
Avg_Price      2.44e+06
Name: 12922, dtype: object
item.Street only: VIVIAN WAY
index: 12960
item: Street       WALDRON MEWS
Avg_Price     2.21614e+06
Name: 12960, dtype: object
item.Street only: WALDRON MEWS
index: 13067
item: Street       WARWICK LANE
Avg_Price      2.2985e+06
Name: 13067, dtype: object
item.Street only: WARWICK LANE
index: 13170
item: Street       WELL ROAD
Avg_Price    2.445e+06
Name: 13170, dtype: object
item.Street only: WELL ROAD
index: 13237
item: Street       WEST HILL PARK
Avg_Price        2.4375e+06
Name: 13237, dtype: object
item.Street only: WEST HILL PARK
index: 13238
item: Street       WEST HILL ROAD
Avg_Price          2.43e+06
Name: 13238, dtype: object
item.Street only: WEST HILL ROAD
index: 13246
item: Street       WEST SQUARE
Avg_Price        2.3e+06
Name: 13246, dtype: object
item.Street only: WEST SQUARE
index: 13248
item: Street       WEST TEMPLE SHEEN
Avg_Price          2.46733e+06
Name: 13248, dtype: object
item.Street only: WEST TEMPLE SHEEN
index: 13386
item: Street       WHITE HART LANE
Avg_Price        2.29838e+06
Name: 13386, dtype: object
item.Street only: WHITE HART LANE
index: 13418
item: Street       WHITTAKER STREET
Avg_Price            2.25e+06
Name: 13418, dtype: object
item.Street only: WHITTAKER STREET
index: 13737
item: Street       WOODYARD LANE
Avg_Price         2.27e+06
Name: 13737, dtype: object
item.Street only: WOODYARD LANE
index: 13795
item: Street       WYATT DRIVE
Avg_Price        2.4e+06
Name: 13795, dtype: object
item.Street only: WYATT DRIVE
geolocator = Nominatim(user_agent="http")
# add in latitude and longitude column
df_price_limit['Latitude'] = df_price_limit['Street'].apply(geolocator.geocode).apply(lambda x: x.latitude)
df_price_limit['Longitude'] = df_price_limit['Street'].apply(geolocator.geocode).apply(lambda x: x.longitude)

print(df_price_limit.shape)
df_p = df_price_limit
df_p.head()
(144, 4)

Street Avg_Price Latitude Longitude
171 ALBERT EMBANKMENT 2.283829e+06 51.493213 -0.121361
186 ALBION SQUARE 2.292500e+06 -41.273758 173.289393
258 ALLEYN PARK 2.283095e+06 51.492742 -0.371967
455 ARGYLE STREET 2.300000e+06 22.319036 114.167841
499 ARTESIAN ROAD 2.462500e+06 41.725304 -88.205529

Map of London

address = 'London,UK'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The Geograpical Co-ordinate of Seattle,Washington are {}, {}.'.format(latitude, longitude))
The Geograpical Co-ordinate of Seattle,Washington are 51.5073219, -0.1276474.
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, street in zip(df_p['Latitude'], df_p['Longitude'], df_p['Street']):
    label = '{}'.format(street)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  

map_london

map_london

Methodology

For the methodology sector, the main goal is to analysis the data and recommend the most valuable venues for the customers based on the surrounding infrastructure and amenities, which retrieved from the Foursquare API.

K-means clustering is used to analyze the venues, given the fact that it is efficient in terms of computational cost and easy to implement when studying larget data set.

Foursquare Credentials

CLIENT_ID = '' # my Foursquare ID
CLIENT_SECRET = '' # my Foursquare Secret
VERSION = '20200604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: '+CLIENT_ID)
print('CLIENT_SECRET: '+CLIENT_SECRET)

Nearby Venues

def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']

    return(nearby_venues)
# get the venues for each street in the dataframe
london_venues = getNearbyVenues(names=df_p['Street'],
                                   latitudes=df_p['Latitude'],
                                   longitudes=df_p['Longitude']
                                  )

print(london_venues.shape)
london_venues.head()
(3688, 7)

Street Street Latitude Street Longitude Venue Venue Latitude Venue Longitude Venue Category
0 ALBERT EMBANKMENT 51.493213 -0.121361 The Garden Cafe, Lambeth Palace 51.495210 -0.119962 Café
1 ALBERT EMBANKMENT 51.493213 -0.121361 Newport Street Gallery 51.493466 -0.117040 Art Gallery
2 ALBERT EMBANKMENT 51.493213 -0.121361 Plaza On The River Club And Residence 51.491529 -0.121371 Hotel
3 ALBERT EMBANKMENT 51.493213 -0.121361 Tamesis Dock 51.491928 -0.121934 Bar
4 ALBERT EMBANKMENT 51.493213 -0.121361 Park Plaza London Riverbank 51.491478 -0.122177 Hotel
# get the List of Unique Categories
print('There are {} uniques categories.'.format(len(london_venues['Venue Category'].unique())))
print(london_venues.shape)
london_venues.groupby('Street').count().head()
There are 318 uniques categories.
(3688, 7)

Street Latitude Street Longitude Venue Venue Latitude Venue Longitude Venue Category
Street
ALBERT EMBANKMENT 53 53 53 53 53 53
ALBION SQUARE 26 26 26 26 26 26
ALLEYN PARK 6 6 6 6 6 6
ARGYLE STREET 87 87 87 87 87 87
ARTESIAN ROAD 8 8 8 8 8 8

One Hot Encoding of Features

# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
london_onehot['Street'] = london_venues['Street'] 

# move street column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]
london_onehot.head()

Street ATM Accessories Store Adult Boutique African Restaurant Airport Airport Terminal American Restaurant Antique Shop Argentinian Restaurant ... Vietnamese Restaurant Warehouse Store Weight Loss Center Whisky Bar Wine Bar Wine Shop Wings Joint Women's Store Yoga Studio Zoo Exhibit
0 ALBERT EMBANKMENT 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 ALBERT EMBANKMENT 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 ALBERT EMBANKMENT 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 ALBERT EMBANKMENT 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 ALBERT EMBANKMENT 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 319 columns

london_grouped = london_onehot.groupby('Street').mean().reset_index()
london_grouped.head()

Street ATM Accessories Store Adult Boutique African Restaurant Airport Airport Terminal American Restaurant Antique Shop Argentinian Restaurant ... Vietnamese Restaurant Warehouse Store Weight Loss Center Whisky Bar Wine Bar Wine Shop Wings Joint Women's Store Yoga Studio Zoo Exhibit
0 ALBERT EMBANKMENT 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 ALBION SQUARE 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 ALLEYN PARK 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 ARGYLE STREET 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.011494 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 ARTESIAN ROAD 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 319 columns

# top 5 venues valued most
num_top_venues = 5

for hood in london_grouped['Street']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
----ALBERT EMBANKMENT----
         venue  freq
0   Restaurant  0.13
1        Hotel  0.11
2         Café  0.11
3          Pub  0.08
4  Pizza Place  0.04


----ALBION SQUARE----
         venue  freq
0         Café  0.19
1          Pub  0.08
2          Bar  0.08
3  Coffee Shop  0.08
4   Restaurant  0.08

...

----WYATT DRIVE----
                    venue  freq
0              Donut Shop  0.29
1      Mexican Restaurant  0.14
2        Airport Terminal  0.14
3          Discount Store  0.14
4  Thrift / Vintage Store  0.14

Most Common Venues on the Street

# definde the function that returns the most common venue
def most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create dataframe
common_venues_sorted = pd.DataFrame(columns=columns)
common_venues_sorted['Street'] = london_grouped['Street']

for ind in np.arange(london_grouped.shape[0]):
    common_venues_sorted.iloc[ind, 1:] = most_common_venues(london_grouped.iloc[ind, :], num_top_venues)
print(common_venues_sorted.shape)
common_venues_sorted.head()
(130, 11)

Street 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 ALBERT EMBANKMENT Restaurant Café Hotel Pub Park Art Gallery Plaza Pizza Place Coffee Shop Breakfast Spot
1 ALBION SQUARE Café Pub Bar Indian Restaurant Restaurant Coffee Shop Park Seafood Restaurant Fish & Chips Shop New American Restaurant
2 ALLEYN PARK Pub Indian Restaurant Playground Lawyer Farmers Market English Restaurant Event Service Event Space Exhibit Fabric Shop
3 ARGYLE STREET Cha Chaan Teng Noodle House Dessert Shop Hotel Sporting Goods Shop Hong Kong Restaurant Vegetarian / Vegan Restaurant Cantonese Restaurant Bakery Toy / Game Store
4 ARTESIAN ROAD Italian Restaurant Sushi Restaurant Indian Restaurant Jewelry Store Auto Garage Pizza Place Pilates Studio Pharmacy Event Space Exhibit

K-Means Clustering

After inspecting the data we obtained, we then apply k-means clustering approach with 5 clusters to study the data.

1. Create clustering dataframe

k = 5
london_grouped_clustering = df_p.drop('Street', 1)
kmeans = KMeans(n_clusters=k, random_state=0).fit(london_grouped_clustering)
kmeans.labels_
array([3, 3, 3, 3, 1, 4, 1, 2, 1, 4, 4, 4, 2, 0, 0, 4, 0, 4, 4, 3, 2, 3,
       1, 3, 2, 0, 3, 3, 1, 1, 4, 4, 0, 4, 1, 3, 4, 0, 1, 4, 3, 4, 1, 2,
       3, 2, 4, 0, 1, 4, 2, 0, 1, 0, 3, 1, 4, 1, 4, 4, 3, 1, 0, 3, 0, 4,
       0, 4, 3, 0, 0, 4, 4, 0, 4, 1, 1, 0, 0, 2, 3, 1, 0, 3, 2, 3, 3, 4,
       2, 0, 3, 3, 1, 0, 1, 2, 4, 0, 4, 2, 4, 3, 0, 2, 0, 3, 3, 3, 0, 2,
       4, 3, 0, 4, 2, 4, 3, 0, 4, 2, 4, 0, 1, 3, 0, 2, 2, 0, 0, 1, 0, 0,
       4, 0, 3, 4, 4, 4, 3, 1, 3, 0, 3, 4], dtype=int32)
# check if using the correct dataframe
london_grouped_clustering=df_p
london_grouped_clustering.head()

Street Avg_Price Latitude Longitude Cluster Labels
171 ALBERT EMBANKMENT 2.283829e+06 51.493213 -0.121361 3
186 ALBION SQUARE 2.292500e+06 -41.273758 173.289393 3
258 ALLEYN PARK 2.283095e+06 51.492742 -0.371967 3
455 ARGYLE STREET 2.300000e+06 22.319036 114.167841 3
499 ARTESIAN ROAD 2.462500e+06 41.725304 -88.205529 1
common_venues_sorted.head()

Cluster Labels Street 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 2 ALBERT EMBANKMENT Restaurant Café Hotel Pub Park Art Gallery Plaza Pizza Place Coffee Shop Breakfast Spot
1 2 ALBION SQUARE Café Pub Bar Indian Restaurant Restaurant Coffee Shop Park Seafood Restaurant Fish & Chips Shop New American Restaurant
2 3 ALLEYN PARK Pub Indian Restaurant Playground Lawyer Farmers Market English Restaurant Event Service Event Space Exhibit Fabric Shop
3 2 ARGYLE STREET Cha Chaan Teng Noodle House Dessert Shop Hotel Sporting Goods Shop Hong Kong Restaurant Vegetarian / Vegan Restaurant Cantonese Restaurant Bakery Toy / Game Store
4 2 ARTESIAN ROAD Italian Restaurant Sushi Restaurant Indian Restaurant Jewelry Store Auto Garage Pizza Place Pilates Studio Pharmacy Event Space Exhibit
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
common_venues_sorted = common_venues_sorted.drop(columns = 'Cluster Labels')
london_grouped_clustering = london_grouped_clustering.join(common_venues_sorted.set_index('Street'), on='Street')
london_grouped_clustering.head()

Street Avg_Price Latitude Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
171 ALBERT EMBANKMENT 2.283829e+06 51.493213 -0.121361 3 Restaurant Café Hotel Pub Park Art Gallery Plaza Pizza Place Coffee Shop Breakfast Spot
186 ALBION SQUARE 2.292500e+06 -41.273758 173.289393 3 Café Pub Bar Indian Restaurant Restaurant Coffee Shop Park Seafood Restaurant Fish & Chips Shop New American Restaurant
258 ALLEYN PARK 2.283095e+06 51.492742 -0.371967 3 Pub Indian Restaurant Playground Lawyer Farmers Market English Restaurant Event Service Event Space Exhibit Fabric Shop
455 ARGYLE STREET 2.300000e+06 22.319036 114.167841 3 Cha Chaan Teng Noodle House Dessert Shop Hotel Sporting Goods Shop Hong Kong Restaurant Vegetarian / Vegan Restaurant Cantonese Restaurant Bakery Toy / Game Store
499 ARTESIAN ROAD 2.462500e+06 41.725304 -88.205529 1 Italian Restaurant Sushi Restaurant Indian Restaurant Jewelry Store Auto Garage Pizza Place Pilates Studio Pharmacy Event Space Exhibit

2. Create map of clustering

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'], 
                                  london_grouped_clustering['Longitude'], 
                                  london_grouped_clustering['Street'], 
                                  london_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

map_cluster

Result

df1 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 0, 
                              london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
df2 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 1, 
                                    london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
df3 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 2, 
                                    london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
df4 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 3, 
                                    london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
df5 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 4, 
                                    london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
# Cluster 0
df1.head()

Avg_Price 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
1439 2.200000e+06 Pub Fish & Chips Shop Sandwich Place Performing Arts Venue Recording Studio Park Fish Market Filipino Restaurant Fast Food Restaurant Farmers Market
1508 2.225000e+06 Pub Sandwich Place Discount Store Hardware Store Supermarket Casino Bar Bus Station Gym River
2046 2.218333e+06 Grocery Store Fried Chicken Joint Bus Stop Fish & Chips Shop Fast Food Restaurant Event Space Exhibit Fabric Shop Falafel Restaurant Farm
2540 2.213750e+06 Italian Restaurant Pub Café Japanese Restaurant Asian Restaurant English Restaurant Juice Bar Cocktail Bar Grocery Store Bakery
3230 2.240000e+06 Chinese Restaurant Café Hotel Metro Station Food Court Event Service Event Space Exhibit Fabric Shop Falafel Restaurant
# Cluster 1
df2.head()

Avg_Price 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
499 2.462500e+06 Italian Restaurant Sushi Restaurant Indian Restaurant Jewelry Store Auto Garage Pizza Place Pilates Studio Pharmacy Event Space Exhibit
680 2.475000e+06 Pub Construction & Landscaping Bar Performing Arts Venue Park Dance Studio Deli / Bodega Event Service Event Space Exhibit
1021 2.500000e+06 Performing Arts Venue Restaurant Zoo Exhibit Fast Food Restaurant Event Service Event Space Exhibit Fabric Shop Falafel Restaurant Farm
2223 2.500000e+06 Italian Restaurant Hotel Light Rail Station Dessert Shop Japanese Restaurant Café Lebanese Restaurant Park Pizza Place Deli / Bodega
2611 2.470154e+06 Pub Lake Fast Food Restaurant Event Service Event Space Exhibit Fabric Shop Falafel Restaurant Farm Farmers Market
# Cluster 2
df3.head()

Avg_Price 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
989 2.344600e+06 Sandwich Place Sports Club Auto Garage Locksmith Zoo Exhibit Farmers Market Event Space Exhibit Fabric Shop Falafel Restaurant
1326 2.376941e+06 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2168 2.361750e+06 Pub Café Coffee Shop Bus Stop Park Platform Thai Restaurant Italian Restaurant Bar Middle Eastern Restaurant
2395 2.323750e+06 Café Bar Italian Restaurant Pub Coffee Shop French Restaurant Park Caribbean Restaurant Pizza Place Spa
3912 2.338185e+06 Gym / Fitness Center Convenience Store Sports Club Martial Arts School Bus Stop Rugby Pitch Farm Event Service Event Space Exhibit
# Cluster 3
df4.head()

Avg_Price 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
171 2.283829e+06 Restaurant Café Hotel Pub Park Art Gallery Plaza Pizza Place Coffee Shop Breakfast Spot
186 2.292500e+06 Café Pub Bar Indian Restaurant Restaurant Coffee Shop Park Seafood Restaurant Fish & Chips Shop New American Restaurant
258 2.283095e+06 Pub Indian Restaurant Playground Lawyer Farmers Market English Restaurant Event Service Event Space Exhibit Fabric Shop
455 2.300000e+06 Cha Chaan Teng Noodle House Dessert Shop Hotel Sporting Goods Shop Hong Kong Restaurant Vegetarian / Vegan Restaurant Cantonese Restaurant Bakery Toy / Game Store
2125 2.305400e+06 Pub Ice Cream Shop Bakery Yoga Studio Coffee Shop Indian Restaurant Hotel Hostel Park Breakfast Spot
# Cluster 4
df5.head()

Avg_Price 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
536 2425000.0 Pub Grocery Store Bakery Indian Restaurant Mediterranean Restaurant Coffee Shop Ice Cream Shop Middle Eastern Restaurant Moroccan Restaurant Fish & Chips Shop
1179 2425000.0 Theater Coffee Shop Bookstore Clothing Store Beer Bar Bakery Dessert Shop Indian Restaurant Boutique Sushi Restaurant
1180 2400000.0 Chinese Restaurant Park Grocery Store English Restaurant Event Service Event Space Exhibit Fabric Shop Falafel Restaurant Farm
1220 2400000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1746 2400000.0 Sandwich Place Lounge Health & Beauty Service Park Convenience Store Food & Drink Shop Flea Market Empanada Restaurant Food Court English Restaurant

Discussion

After studying the dataset, we may gain a better insight across the London house market based on the average street price and the amenities nearby. London Housing Market, though is said to be in a rut at the moment, is still vibrant and of potential.

We start by analysing the result from different angles.

Grouping by Clusters

According to the five clusters produced in the end, we could target housebuyers’ interests when suggesting the best venues. For example, Cluster 0 may target customers looking for various food places and economic life expenses, for they show the most common of restaurants, pub and cafe with a relatively low average price. Instead, customers, a fancy high-end community may make their choices among estates on the streets in Cluster 1, with art museums and exhibitions surrounded.

Future Market Study and Prediction

On the other hand, we could also make our prediction on the future market based on the level of infrastructures and amenities such as schools and hospitals. It is also interesting to notice that there are outstanding venues located at South-West London (Wimbledon, Baham) and North-West London (Kilburn, Hampstead), which may be due to the city expansion. This may favour potential investors into the housing market who want to purchase venues as an investment option.

Further implementations

Further Study can also be brought out based on detailed requirements provided by the housebuyers, such as sorting the venues in each cluster by average price or picking out the estates with the most common venues specified.

Conclusion

Although housing markets in London have witnessed several ups and downs, with volatile house prices in London tending to amplify changes in national house prices recently, there is still great potential lying in the market for investors and house buyers. Recently, as a result of the pandemic, increasing unemployment rate, wage cuts, business failures, and job uncertainty also brought massive impact to the housing market in London. However, it is relatively hard to provide an insight of the market from the patterns of previous cycles for clients, so in this report, we focused on the integration of data science and machine learning technics to assist clients making sensible and efficient choices. From the results provided, we could easily recommend profitable venues according to amenities and infrastructures surrounded.

We first retrieved data on London properties and the relative price from the HM Land Registry (http://landregistry.data.gov.uk/). Amenities and infrastructure data frame are built with the Foursquare API. Merging the two datasets, we were able to have more detailed information about the market and start our study. K-means clustering technique is used to cluster all the data into five different clusters. Finally, we could use the results to better study the London house market and make our choice of data based on clients’ requirement.

This approach is a visualizable and efficient way for housebuyers to gain an insight into the housing market, and the methodology can be applied to other places with proper data provided. It is also worth notice that the due to the robust of the housing market, making a wise decision is essential even with the help of algorithm technics and data analysis which is based on past market performance but not predicting future incidences.

Reference

[1] http://roshangrewal.com/capstone-project-the-battle-of-neighborhoods-finding-a-better-place-in-scarborough-toronto/

[2] https://developer.foursquare.com/docs

[3] https://github.com/roshangrewal/coursera-applied-data-science-capstone/blob/master/Week%205/Capstone%20Project%20%E2%80%93%20The%20Battle%20of%20Neighborhoods%20%20-%20Finding%20a%20Better%20Place%20in%20Scarborough%2C%20Toronto.ipynb