Capstone Project: The Battle of Neighborhood
Recommending the Most Valuable Venue for Homebuyers in London
This is a project for Coursera Applied Data Science Capstone.
Table of contents
Introduction
Background
According to Working Paper 72 published by the London Government, London’s house prices are relatively high compared to other areas in the UK and have been rising at a fast rate in recent years.
Meanwhile, housing markets in London have witnessed several ups and downs, with volatile house prices in London tending to amplify changes in national house prices.
London has experienced several episodes of real house price deflation.
From the patterns of previous cycles, it is hard to tell whether house prices are approaching a new peak or a downward adjustment.
Recently, as a result of the pandemic, increasing unemployment rate, wage cuts, business failures, and job uncertainty also brought massive impact to the housing market in London.
According to the Nationwide, house prices fell 1.7% in May from the previous month, the most significant monthly fall for 11 years, due to the headwinds of Covid-19.
Business Problem
As a matter of fact, it is of great necessity to integrate data science and machine learning technics with the study of London house market in order to assist clients making sensible and efficient choices.
How could we provide suggestions to our clients based on the study of neighborhoods for either settling or investing is our main business problem. We will recommend profitable venues according to amenities and infrastructures surrounded.
Data Section
London properties monthly updated price paid data is retrieved from the HM Land Registry.
Data link: http://landregistry.data.gov.uk/
After finding the list of neighbourhoods, we then connect to the Foursquare API to gather information about venues inside every neighbourhood. FourSquare API interface is a location data provider with information about venues. Data retrieved include venue name, latitude, longitude, category, street name and street location.
Properties Data
Let’s first download the properties’ data from the HM Land Registry website:
Import Libraries
import os
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
!conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library
import requests
import xml
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup
Retrieve Data from source
Here we use the data in csv file which containing transfer information in 2019.
df = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2019.csv")
df.shape
(972285, 16)
df.head(5)
{8CAC1318-AC2F-0253-E053-6B04A8C08E51} | 165950 | 2019-05-28 00:00 | BS22 7FP | T | N | F | 32 | Unnamed: 8 | KELSTON GARDENS | Unnamed: 10 | WESTON-SUPER-MARE | NORTH SOMERSET | NORTH SOMERSET.1 | A | A.1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | {8CAC1318-AC31-0253-E053-6B04A8C08E51} | 119500 | 2019-05-22 00:00 | TA8 2EY | F | N | L | GROVE HOUSE, 58 | FLAT 1 | BERROW ROAD | NaN | BURNHAM-ON-SEA | SEDGEMOOR | SOMERSET | A | A |
1 | {8CAC1318-AC33-0253-E053-6B04A8C08E51} | 215000 | 2019-06-21 00:00 | TA22 9DH | T | N | F | 5 | NaN | WEIR HEAD | NaN | DULVERTON | SOMERSET WEST AND TAUNTON | SOMERSET | A | A |
2 | {8CAC1318-AC35-0253-E053-6B04A8C08E51} | 242500 | 2019-05-01 00:00 | BS20 7BP | F | N | L | 50 | NaN | LOWER BURLINGTON ROAD | PORTISHEAD | BRISTOL | NORTH SOMERSET | NORTH SOMERSET | A | A |
3 | {8CAC1318-AC36-0253-E053-6B04A8C08E51} | 318000 | 2019-05-09 00:00 | BA3 2RW | T | N | F | 4 | NaN | BUSHY COMBE | MIDSOMER NORTON | RADSTOCK | BATH AND NORTH EAST SOMERSET | BATH AND NORTH EAST SOMERSET | A | A |
4 | {8CAC1318-AC37-0253-E053-6B04A8C08E51} | 215000 | 2019-05-17 00:00 | BS24 7HZ | S | N | F | 9 | NaN | OSMOND ROAD | NaN | WESTON-SUPER-MARE | NORTH SOMERSET | NORTH SOMERSET | A | A |
Data Preprocessing
We need to process the data frame for further usage. Here we aim to group the venues by street assigned in the data frame and look into the average price of each street.
Targetting housebuyers’ price expectation, the upper limit and lower limit of price can be set to narrow down the venues we are looking for.
Preprocessing the data to finally obtain a dataframe containing the following information:
- Street Name
- Average Price of Venues
- Location Information
1. Rename the columns
# Assign meaningful column names
df.columns = ['TUID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
'SAON', 'Street', 'Locality', 'City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']
df.head()
TUID | Price | Date_Transfer | Postcode | Prop_Type | Old_New | Duration | PAON | SAON | Street | Locality | City | District | County | PPD_Cat_Type | Record_Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | {8CAC1318-AC31-0253-E053-6B04A8C08E51} | 119500 | 2019-05-22 00:00 | TA8 2EY | F | N | L | GROVE HOUSE, 58 | FLAT 1 | BERROW ROAD | NaN | BURNHAM-ON-SEA | SEDGEMOOR | SOMERSET | A | A |
1 | {8CAC1318-AC33-0253-E053-6B04A8C08E51} | 215000 | 2019-06-21 00:00 | TA22 9DH | T | N | F | 5 | NaN | WEIR HEAD | NaN | DULVERTON | SOMERSET WEST AND TAUNTON | SOMERSET | A | A |
2 | {8CAC1318-AC35-0253-E053-6B04A8C08E51} | 242500 | 2019-05-01 00:00 | BS20 7BP | F | N | L | 50 | NaN | LOWER BURLINGTON ROAD | PORTISHEAD | BRISTOL | NORTH SOMERSET | NORTH SOMERSET | A | A |
3 | {8CAC1318-AC36-0253-E053-6B04A8C08E51} | 318000 | 2019-05-09 00:00 | BA3 2RW | T | N | F | 4 | NaN | BUSHY COMBE | MIDSOMER NORTON | RADSTOCK | BATH AND NORTH EAST SOMERSET | BATH AND NORTH EAST SOMERSET | A | A |
4 | {8CAC1318-AC37-0253-E053-6B04A8C08E51} | 215000 | 2019-05-17 00:00 | BS24 7HZ | S | N | F | 9 | NaN | OSMOND ROAD | NaN | WESTON-SUPER-MARE | NORTH SOMERSET | NORTH SOMERSET | A | A |
2. Select column City
which is only in London
df_ld = df.query("City == 'LONDON'")
df_ld.head()
TUID | Price | Date_Transfer | Postcode | Prop_Type | Old_New | Duration | PAON | SAON | Street | Locality | City | District | County | PPD_Cat_Type | Record_Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
86 | {8CAC1318-AFF3-0253-E053-6B04A8C08E51} | 1100000 | 2019-06-07 00:00 | SW19 8BN | T | N | F | 30 | NaN | ASHEN GROVE | NaN | LONDON | MERTON | GREATER LONDON | A | A |
88 | {8CAC1318-AFF5-0253-E053-6B04A8C08E51} | 905000 | 2019-06-20 00:00 | SW19 8BH | T | N | F | 35 | NaN | DURNSFORD AVENUE | NaN | LONDON | MERTON | GREATER LONDON | A | A |
95 | {8CAC1318-AFFC-0253-E053-6B04A8C08E51} | 1440000 | 2019-06-07 00:00 | SW13 8DL | T | N | F | 29 | NaN | MERTHYR TERRACE | NaN | LONDON | RICHMOND UPON THAMES | GREATER LONDON | A | A |
167 | {8CAC1318-B38E-0253-E053-6B04A8C08E51} | 402500 | 2019-06-14 00:00 | SW16 4PB | S | N | F | 9 | NaN | POLLARDS WOOD ROAD | NaN | LONDON | CROYDON | GREATER LONDON | A | A |
292 | {8F1B26BD-4595-53DB-E053-6C04A8C03649} | 340000 | 2019-07-15 00:00 | N1 9BT | F | N | L | 75 | FLAT 4 | CALEDONIAN ROAD | NaN | LONDON | ISLINGTON | GREATER LONDON | A | A |
3. Format transfer date
df_ld['Date_Transfer'] = df_ld['Date_Transfer'].apply(pd.to_datetime)
# Sort by Date
df_ld.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)
df_ld.head()
TUID | Price | Date_Transfer | Postcode | Prop_Type | Old_New | Duration | PAON | SAON | Street | Locality | City | District | County | PPD_Cat_Type | Record_Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
730629 | {A2479555-A33A-74C7-E053-6B04A8C0887D} | 670000 | 2019-12-31 | SW18 1UT | F | Y | L | 8 | APARTMENT 1 | OSIERS ROAD | NaN | LONDON | WANDSWORTH | GREATER LONDON | A | A |
730613 | {A2479555-A326-74C7-E053-6B04A8C0887D} | 600000 | 2019-12-31 | SW18 1UX | F | Y | L | 12 | APARTMENT 46 | OSIERS ROAD | NaN | LONDON | WANDSWORTH | GREATER LONDON | A | A |
730633 | {A2479555-A347-74C7-E053-6B04A8C0887D} | 520000 | 2019-12-31 | SW18 1UX | F | Y | L | 12 | APARTMENT 7 | OSIERS ROAD | NaN | LONDON | WANDSWORTH | GREATER LONDON | A | A |
730636 | {A2479555-A34B-74C7-E053-6B04A8C0887D} | 600000 | 2019-12-31 | SW18 1UT | F | Y | L | 8 | APARTMENT 12 | OSIERS ROAD | NaN | LONDON | WANDSWORTH | GREATER LONDON | A | A |
819258 | {9FF0D96A-38F5-11ED-E053-6C04A8C06383} | 600000 | 2019-12-31 | SW18 1UX | F | Y | L | FLAT 45, 12 | NaN | OSIERS ROAD | NaN | LONDON | WANDSWORTH | GREATER LONDON | A | A |
4. Group by Street and calculate street average price
streets = df_ld['Street'].unique().tolist()
df_grp_price = df_ld.groupby(['Street'])['Price'].mean().reset_index()
df_grp_price.columns = ['Street', 'Avg_Price']
Depend on the Client’s Budget to set upper limit and lower limit that fits the inquiry
df_price_limit = df_grp_price.query("(Avg_Price >= 2200000) & (Avg_Price <= 2500000)")
df_price_limit.head()
Street | Avg_Price | |
---|---|---|
171 | ALBERT EMBANKMENT | 2.283829e+06 |
186 | ALBION SQUARE | 2.292500e+06 |
258 | ALLEYN PARK | 2.283095e+06 |
455 | ARGYLE STREET | 2.300000e+06 |
499 | ARTESIAN ROAD | 2.462500e+06 |
5. Use geolocator to obtain the latitude and longitutde for each street
for index, item in df_price_limit.iterrows():
print(f"index: {index}")
print(f"item: {item}")
print(f"item.Street only: {item.Street}")
index: 171
item: Street ALBERT EMBANKMENT
Avg_Price 2.28383e+06
Name: 171, dtype: object
item.Street only: ALBERT EMBANKMENT
index: 186
item: Street ALBION SQUARE
Avg_Price 2.2925e+06
Name: 186, dtype: object
item.Street only: ALBION SQUARE
index: 258
item: Street ALLEYN PARK
Avg_Price 2.2831e+06
Name: 258, dtype: object
item.Street only: ALLEYN PARK
index: 455
item: Street ARGYLE STREET
Avg_Price 2.3e+06
Name: 455, dtype: object
item.Street only: ARGYLE STREET
index: 499
item: Street ARTESIAN ROAD
Avg_Price 2.4625e+06
Name: 499, dtype: object
item.Street only: ARTESIAN ROAD
index: 536
item: Street ASHCHURCH GROVE
Avg_Price 2.425e+06
Name: 536, dtype: object
item.Street only: ASHCHURCH GROVE
index: 680
item: Street AYNHOE ROAD
Avg_Price 2.475e+06
Name: 680, dtype: object
item.Street only: AYNHOE ROAD
index: 989
item: Street BEDALE STREET
Avg_Price 2.3446e+06
Name: 989, dtype: object
item.Street only: BEDALE STREET
index: 1021
item: Street BEECHWOOD AVENUE
Avg_Price 2.5e+06
Name: 1021, dtype: object
item.Street only: BEECHWOOD AVENUE
index: 1179
item: Street BETTERTON STREET
Avg_Price 2.425e+06
Name: 1179, dtype: object
item.Street only: BETTERTON STREET
index: 1180
item: Street BETTRIDGE ROAD
Avg_Price 2.4e+06
Name: 1180, dtype: object
item.Street only: BETTRIDGE ROAD
index: 1220
item: Street BILLING ROAD
Avg_Price 2.4e+06
Name: 1220, dtype: object
item.Street only: BILLING ROAD
index: 1326
item: Street BLENHEIM CRESCENT
Avg_Price 2.37694e+06
Name: 1326, dtype: object
item.Street only: BLENHEIM CRESCENT
index: 1439
item: Street BOURDON STREET
Avg_Price 2.2e+06
Name: 1439, dtype: object
item.Street only: BOURDON STREET
index: 1508
item: Street BRADFIELD ROAD
Avg_Price 2.225e+06
Name: 1508, dtype: object
item.Street only: BRADFIELD ROAD
index: 1746
item: Street BROMPTON PLACE
Avg_Price 2.4e+06
Name: 1746, dtype: object
item.Street only: BROMPTON PLACE
index: 2046
item: Street CAITHNESS ROAD
Avg_Price 2.21833e+06
Name: 2046, dtype: object
item.Street only: CAITHNESS ROAD
index: 2105
item: Street CAMBRIDGE TERRACE
Avg_Price 2.42e+06
Name: 2105, dtype: object
item.Street only: CAMBRIDGE TERRACE
index: 2113
item: Street CAMDEN SQUARE
Avg_Price 2.43e+06
Name: 2113, dtype: object
item.Street only: CAMDEN SQUARE
index: 2125
item: Street CAMPDEN HILL ROAD
Avg_Price 2.3054e+06
Name: 2125, dtype: object
item.Street only: CAMPDEN HILL ROAD
index: 2168
item: Street CANONBURY PARK SOUTH
Avg_Price 2.36175e+06
Name: 2168, dtype: object
item.Street only: CANONBURY PARK SOUTH
index: 2218
item: Street CARLISLE PLACE
Avg_Price 2.284e+06
Name: 2218, dtype: object
item.Street only: CARLISLE PLACE
index: 2223
item: Street CARLTON GARDENS
Avg_Price 2.5e+06
Name: 2223, dtype: object
item.Street only: CARLTON GARDENS
index: 2227
item: Street CARLTON ROAD
Avg_Price 2.26689e+06
Name: 2227, dtype: object
item.Street only: CARLTON ROAD
index: 2395
item: Street CHALCOT SQUARE
Avg_Price 2.32375e+06
Name: 2395, dtype: object
item.Street only: CHALCOT SQUARE
index: 2540
item: Street CHELSEA EMBANKMENT
Avg_Price 2.21375e+06
Name: 2540, dtype: object
item.Street only: CHELSEA EMBANKMENT
index: 2556
item: Street CHENISTON GARDENS
Avg_Price 2.31e+06
Name: 2556, dtype: object
item.Street only: CHENISTON GARDENS
index: 2571
item: Street CHESHAM MEWS
Avg_Price 2.26e+06
Name: 2571, dtype: object
item.Street only: CHESHAM MEWS
index: 2611
item: Street CHEVENING ROAD
Avg_Price 2.47015e+06
Name: 2611, dtype: object
item.Street only: CHEVENING ROAD
index: 2966
item: Street COLLINGHAM ROAD
Avg_Price 2.477e+06
Name: 2966, dtype: object
item.Street only: COLLINGHAM ROAD
index: 3206
item: Street COULTER ROAD
Avg_Price 2.4e+06
Name: 3206, dtype: object
item.Street only: COULTER ROAD
index: 3218
item: Street COURT LANE GARDENS
Avg_Price 2.44e+06
Name: 3218, dtype: object
item.Street only: COURT LANE GARDENS
index: 3230
item: Street COURTHOPE ROAD
Avg_Price 2.24e+06
Name: 3230, dtype: object
item.Street only: COURTHOPE ROAD
index: 3237
item: Street COURTNELL STREET
Avg_Price 2.40955e+06
Name: 3237, dtype: object
item.Street only: COURTNELL STREET
index: 3308
item: Street CRAWFORD MEWS
Avg_Price 2.5e+06
Name: 3308, dtype: object
item.Street only: CRAWFORD MEWS
index: 3319
item: Street CREDITON HILL
Avg_Price 2.26533e+06
Name: 3319, dtype: object
item.Street only: CREDITON HILL
index: 3597
item: Street DARTMOUTH PARK AVENUE
Avg_Price 2.38833e+06
Name: 3597, dtype: object
item.Street only: DARTMOUTH PARK AVENUE
index: 3646
item: Street DE VERE GARDENS
Avg_Price 2.21906e+06
Name: 3646, dtype: object
item.Street only: DE VERE GARDENS
index: 3672
item: Street DEEPDALE
Avg_Price 2.48e+06
Name: 3672, dtype: object
item.Street only: DEEPDALE
index: 3675
item: Street DEER PARK ROAD
Avg_Price 2.45e+06
Name: 3675, dtype: object
item.Street only: DEER PARK ROAD
index: 3761
item: Street DEVEREUX LANE
Avg_Price 2.3e+06
Name: 3761, dtype: object
item.Street only: DEVEREUX LANE
index: 3771
item: Street DEVONSHIRE MEWS WEST
Avg_Price 2.4e+06
Name: 3771, dtype: object
item.Street only: DEVONSHIRE MEWS WEST
index: 3891
item: Street DOVER STREET
Avg_Price 2.5e+06
Name: 3891, dtype: object
item.Street only: DOVER STREET
index: 3912
item: Street DOWNSIDE CRESCENT
Avg_Price 2.33818e+06
Name: 3912, dtype: object
item.Street only: DOWNSIDE CRESCENT
index: 3974
item: Street DUDLEY ROAD
Avg_Price 2.31558e+06
Name: 3974, dtype: object
item.Street only: DUDLEY ROAD
index: 3982
item: Street DUKES LANE
Avg_Price 2.35e+06
Name: 3982, dtype: object
item.Street only: DUKES LANE
index: 4146
item: Street EATON TERRACE MEWS
Avg_Price 2.4e+06
Name: 4146, dtype: object
item.Street only: EATON TERRACE MEWS
index: 4162
item: Street ECCLESTON MEWS
Avg_Price 2.225e+06
Name: 4162, dtype: object
item.Street only: ECCLESTON MEWS
index: 4165
item: Street ECCLESTON SQUARE
Avg_Price 2.49091e+06
Name: 4165, dtype: object
item.Street only: ECCLESTON SQUARE
index: 4284
item: Street ELLERBY STREET
Avg_Price 2.45e+06
Name: 4284, dtype: object
item.Street only: ELLERBY STREET
index: 4393
item: Street ELVASTON PLACE
Avg_Price 2.352e+06
Name: 4393, dtype: object
item.Street only: ELVASTON PLACE
index: 4454
item: Street ENNISMORE GARDENS MEWS
Avg_Price 2.2e+06
Name: 4454, dtype: object
item.Street only: ENNISMORE GARDENS MEWS
index: 4502
item: Street ESSEX STREET
Avg_Price 2.46204e+06
Name: 4502, dtype: object
item.Street only: ESSEX STREET
index: 4553
item: Street EVERSLEY CRESCENT
Avg_Price 2.2e+06
Name: 4553, dtype: object
item.Street only: EVERSLEY CRESCENT
index: 4582
item: Street EYNELLA ROAD
Avg_Price 2.296e+06
Name: 4582, dtype: object
item.Street only: EYNELLA ROAD
index: 4713
item: Street FENCHURCH STREET
Avg_Price 2.46012e+06
Name: 4713, dtype: object
item.Street only: FENCHURCH STREET
index: 4851
item: Street FLORAL STREET
Avg_Price 2.44275e+06
Name: 4851, dtype: object
item.Street only: FLORAL STREET
index: 4949
item: Street FOUNTAYNE ROAD
Avg_Price 2.4975e+06
Name: 4949, dtype: object
item.Street only: FOUNTAYNE ROAD
index: 4950
item: Street FOURNIER STREET
Avg_Price 2.45e+06
Name: 4950, dtype: object
item.Street only: FOURNIER STREET
index: 5039
item: Street FRISTON STREET
Avg_Price 2.45e+06
Name: 5039, dtype: object
item.Street only: FRISTON STREET
index: 5218
item: Street GERTRUDE STREET
Avg_Price 2.3e+06
Name: 5218, dtype: object
item.Street only: GERTRUDE STREET
index: 5361
item: Street GLOUCESTER PLACE MEWS
Avg_Price 2.5e+06
Name: 5361, dtype: object
item.Street only: GLOUCESTER PLACE MEWS
index: 5424
item: Street GORDON COTTAGES
Avg_Price 2.2e+06
Name: 5424, dtype: object
item.Street only: GORDON COTTAGES
index: 5434
item: Street GORST ROAD
Avg_Price 2.3e+06
Name: 5434, dtype: object
item.Street only: GORST ROAD
index: 5469
item: Street GRAHAM TERRACE
Avg_Price 2.2e+06
Name: 5469, dtype: object
item.Street only: GRAHAM TERRACE
index: 5553
item: Street GREAT TITCHFIELD STREET
Avg_Price 2.395e+06
Name: 5553, dtype: object
item.Street only: GREAT TITCHFIELD STREET
index: 5793
item: Street HALLAM STREET
Avg_Price 2.205e+06
Name: 5793, dtype: object
item.Street only: HALLAM STREET
index: 5801
item: Street HALSEY STREET
Avg_Price 2.387e+06
Name: 5801, dtype: object
item.Street only: HALSEY STREET
index: 5931
item: Street HARLEY GARDENS
Avg_Price 2.28e+06
Name: 5931, dtype: object
item.Street only: HARLEY GARDENS
index: 5944
item: Street HARMSWORTH WAY
Avg_Price 2.24188e+06
Name: 5944, dtype: object
item.Street only: HARMSWORTH WAY
index: 5985
item: Street HARWOOD ROAD
Avg_Price 2.2245e+06
Name: 5985, dtype: object
item.Street only: HARWOOD ROAD
index: 6004
item: Street HATCHAM ROAD
Avg_Price 2.44008e+06
Name: 6004, dtype: object
item.Street only: HATCHAM ROAD
index: 6100
item: Street HEATH DRIVE
Avg_Price 2.388e+06
Name: 6100, dtype: object
item.Street only: HEATH DRIVE
index: 6101
item: Street HEATH HURST ROAD
Avg_Price 2.2475e+06
Name: 6101, dtype: object
item.Street only: HEATH HURST ROAD
index: 6160
item: Street HENDERSON ROAD
Avg_Price 2.39e+06
Name: 6160, dtype: object
item.Street only: HENDERSON ROAD
index: 6298
item: Street HIGHBURY NEW PARK
Avg_Price 2.4975e+06
Name: 6298, dtype: object
item.Street only: HIGHBURY NEW PARK
index: 6346
item: Street HILL STREET
Avg_Price 2.5e+06
Name: 6346, dtype: object
item.Street only: HILL STREET
index: 6382
item: Street HILLWAY
Avg_Price 2.25e+06
Name: 6382, dtype: object
item.Street only: HILLWAY
index: 6421
item: Street HOLBORN CLOSE
Avg_Price 2.2e+06
Name: 6421, dtype: object
item.Street only: HOLBORN CLOSE
index: 6444
item: Street HOLLAND PARK ROAD
Avg_Price 2.35e+06
Name: 6444, dtype: object
item.Street only: HOLLAND PARK ROAD
index: 6687
item: Street HYDE VALE
Avg_Price 2.31517e+06
Name: 6687, dtype: object
item.Street only: HYDE VALE
index: 6767
item: Street IRVING ROAD
Avg_Price 2.5e+06
Name: 6767, dtype: object
item.Street only: IRVING ROAD
index: 6899
item: Street JUNCTION PLACE
Avg_Price 2.2e+06
Name: 6899, dtype: object
item.Street only: JUNCTION PLACE
index: 7055
item: Street KILDARE GARDENS
Avg_Price 2.3e+06
Name: 7055, dtype: object
item.Street only: KILDARE GARDENS
index: 7470
item: Street LEAMOUTH ROAD
Avg_Price 2.38315e+06
Name: 7470, dtype: object
item.Street only: LEAMOUTH ROAD
index: 7516
item: Street LEINSTER SQUARE
Avg_Price 2.2723e+06
Name: 7516, dtype: object
item.Street only: LEINSTER SQUARE
index: 7620
item: Street LILYVILLE ROAD
Avg_Price 2.29167e+06
Name: 7620, dtype: object
item.Street only: LILYVILLE ROAD
index: 7849
item: Street LOWER BELGRAVE STREET
Avg_Price 2.40375e+06
Name: 7849, dtype: object
item.Street only: LOWER BELGRAVE STREET
index: 7953
item: Street LYONS PLACE
Avg_Price 2.3667e+06
Name: 7953, dtype: object
item.Street only: LYONS PLACE
index: 8154
item: Street MARESFIELD GARDENS
Avg_Price 2.24208e+06
Name: 8154, dtype: object
item.Street only: MARESFIELD GARDENS
index: 8186
item: Street MARKHAM STREET
Avg_Price 2.3e+06
Name: 8186, dtype: object
item.Street only: MARKHAM STREET
index: 8511
item: Street MILKY WAY
Avg_Price 2.25718e+06
Name: 8511, dtype: object
item.Street only: MILKY WAY
index: 8671
item: Street MORELLA ROAD
Avg_Price 2.5e+06
Name: 8671, dtype: object
item.Street only: MORELLA ROAD
index: 8758
item: Street MOUNTVIEW CLOSE
Avg_Price 2.23333e+06
Name: 8758, dtype: object
item.Street only: MOUNTVIEW CLOSE
index: 9003
item: Street NIGHTINGALE SQUARE
Avg_Price 2.475e+06
Name: 9003, dtype: object
item.Street only: NIGHTINGALE SQUARE
index: 9047
item: Street NORRICE LEA
Avg_Price 2.355e+06
Name: 9047, dtype: object
item.Street only: NORRICE LEA
index: 9152
item: Street OAK HILL PARK MEWS
Avg_Price 2.4e+06
Name: 9152, dtype: object
item.Street only: OAK HILL PARK MEWS
index: 9252
item: Street OLD KENT ROAD
Avg_Price 2.24288e+06
Name: 9252, dtype: object
item.Street only: OLD KENT ROAD
index: 9301
item: Street ONSLOW CRESCENT
Avg_Price 2.4e+06
Name: 9301, dtype: object
item.Street only: ONSLOW CRESCENT
index: 9337
item: Street ORMONDE GATE
Avg_Price 2.35e+06
Name: 9337, dtype: object
item.Street only: ORMONDE GATE
index: 9418
item: Street PADDINGTON STREET
Avg_Price 2.39321e+06
Name: 9418, dtype: object
item.Street only: PADDINGTON STREET
index: 9433
item: Street PALACE COURT
Avg_Price 2.27367e+06
Name: 9433, dtype: object
item.Street only: PALACE COURT
index: 9512
item: Street PARKE ROAD
Avg_Price 2.25e+06
Name: 9512, dtype: object
item.Street only: PARKE ROAD
index: 9546
item: Street PARR'S WAY
Avg_Price 2.33501e+06
Name: 9546, dtype: object
item.Street only: PARR'S WAY
index: 9550
item: Street PARSONS GATE MEWS
Avg_Price 2.2e+06
Name: 9550, dtype: object
item.Street only: PARSONS GATE MEWS
index: 9584
item: Street PAVILION ROAD
Avg_Price 2.3e+06
Name: 9584, dtype: object
item.Street only: PAVILION ROAD
index: 9653
item: Street PEMBRIDGE CRESCENT
Avg_Price 2.27679e+06
Name: 9653, dtype: object
item.Street only: PEMBRIDGE CRESCENT
index: 9654
item: Street PEMBRIDGE MEWS
Avg_Price 2.2851e+06
Name: 9654, dtype: object
item.Street only: PEMBRIDGE MEWS
index: 9693
item: Street PENNINGTON STREET
Avg_Price 2.25195e+06
Name: 9693, dtype: object
item.Street only: PENNINGTON STREET
index: 9791
item: Street PICTON STREET
Avg_Price 2.35e+06
Name: 9791, dtype: object
item.Street only: PICTON STREET
index: 9807
item: Street PINE GROVE
Avg_Price 2.4e+06
Name: 9807, dtype: object
item.Street only: PINE GROVE
index: 9929
item: Street PORTLAND PLACE
Avg_Price 2.27586e+06
Name: 9929, dtype: object
item.Street only: PORTLAND PLACE
index: 10028
item: Street PRINCES SQUARE
Avg_Price 2.23089e+06
Name: 10028, dtype: object
item.Street only: PRINCES SQUARE
index: 10116
item: Street QUEEN ANNE STREET
Avg_Price 2.44167e+06
Name: 10116, dtype: object
item.Street only: QUEEN ANNE STREET
index: 10119
item: Street QUEEN ANNES GROVE
Avg_Price 2.36e+06
Name: 10119, dtype: object
item.Street only: QUEEN ANNES GROVE
index: 10350
item: Street REDCLIFFE MEWS
Avg_Price 2.41589e+06
Name: 10350, dtype: object
item.Street only: REDCLIFFE MEWS
index: 10628
item: Street ROSE SQUARE
Avg_Price 2.29167e+06
Name: 10628, dtype: object
item.Street only: ROSE SQUARE
index: 10829
item: Street RYECROFT STREET
Avg_Price 2.225e+06
Name: 10829, dtype: object
item.Street only: RYECROFT STREET
index: 10875
item: Street SALUSBURY ROAD
Avg_Price 2.407e+06
Name: 10875, dtype: object
item.Street only: SALUSBURY ROAD
index: 11317
item: Street SOUTH END ROW
Avg_Price 2.35e+06
Name: 11317, dtype: object
item.Street only: SOUTH END ROW
index: 11328
item: Street SOUTH PARADE
Avg_Price 2.425e+06
Name: 11328, dtype: object
item.Street only: SOUTH PARADE
index: 11430
item: Street SPRIMONT PLACE
Avg_Price 2.25e+06
Name: 11430, dtype: object
item.Street only: SPRIMONT PLACE
index: 11488
item: Street ST AUBYNS AVENUE
Avg_Price 2.48e+06
Name: 11488, dtype: object
item.Street only: ST AUBYNS AVENUE
index: 11554
item: Street ST JOHN STREET
Avg_Price 2.28894e+06
Name: 11554, dtype: object
item.Street only: ST JOHN STREET
index: 11677
item: Street ST STEPHENS GARDENS
Avg_Price 2.20317e+06
Name: 11677, dtype: object
item.Street only: ST STEPHENS GARDENS
index: 11843
item: Street STRADELLA ROAD
Avg_Price 2.34425e+06
Name: 11843, dtype: object
item.Street only: STRADELLA ROAD
index: 12065
item: Street SYDNEY STREET
Avg_Price 2.34882e+06
Name: 12065, dtype: object
item.Street only: SYDNEY STREET
index: 12138
item: Street TEDWORTH SQUARE
Avg_Price 2.2e+06
Name: 12138, dtype: object
item.Street only: TEDWORTH SQUARE
index: 12274
item: Street THE LITTLE BOLTONS
Avg_Price 2.25e+06
Name: 12274, dtype: object
item.Street only: THE LITTLE BOLTONS
index: 12277
item: Street THE MARLOWES
Avg_Price 2.5e+06
Name: 12277, dtype: object
item.Street only: THE MARLOWES
index: 12501
item: Street TOTTERIDGE COMMON
Avg_Price 2.2e+06
Name: 12501, dtype: object
item.Street only: TOTTERIDGE COMMON
index: 12740
item: Street UPPER MONTAGU STREET
Avg_Price 2.23143e+06
Name: 12740, dtype: object
item.Street only: UPPER MONTAGU STREET
index: 12922
item: Street VIVIAN WAY
Avg_Price 2.44e+06
Name: 12922, dtype: object
item.Street only: VIVIAN WAY
index: 12960
item: Street WALDRON MEWS
Avg_Price 2.21614e+06
Name: 12960, dtype: object
item.Street only: WALDRON MEWS
index: 13067
item: Street WARWICK LANE
Avg_Price 2.2985e+06
Name: 13067, dtype: object
item.Street only: WARWICK LANE
index: 13170
item: Street WELL ROAD
Avg_Price 2.445e+06
Name: 13170, dtype: object
item.Street only: WELL ROAD
index: 13237
item: Street WEST HILL PARK
Avg_Price 2.4375e+06
Name: 13237, dtype: object
item.Street only: WEST HILL PARK
index: 13238
item: Street WEST HILL ROAD
Avg_Price 2.43e+06
Name: 13238, dtype: object
item.Street only: WEST HILL ROAD
index: 13246
item: Street WEST SQUARE
Avg_Price 2.3e+06
Name: 13246, dtype: object
item.Street only: WEST SQUARE
index: 13248
item: Street WEST TEMPLE SHEEN
Avg_Price 2.46733e+06
Name: 13248, dtype: object
item.Street only: WEST TEMPLE SHEEN
index: 13386
item: Street WHITE HART LANE
Avg_Price 2.29838e+06
Name: 13386, dtype: object
item.Street only: WHITE HART LANE
index: 13418
item: Street WHITTAKER STREET
Avg_Price 2.25e+06
Name: 13418, dtype: object
item.Street only: WHITTAKER STREET
index: 13737
item: Street WOODYARD LANE
Avg_Price 2.27e+06
Name: 13737, dtype: object
item.Street only: WOODYARD LANE
index: 13795
item: Street WYATT DRIVE
Avg_Price 2.4e+06
Name: 13795, dtype: object
item.Street only: WYATT DRIVE
geolocator = Nominatim(user_agent="http")
# add in latitude and longitude column
df_price_limit['Latitude'] = df_price_limit['Street'].apply(geolocator.geocode).apply(lambda x: x.latitude)
df_price_limit['Longitude'] = df_price_limit['Street'].apply(geolocator.geocode).apply(lambda x: x.longitude)
print(df_price_limit.shape)
df_p = df_price_limit
df_p.head()
(144, 4)
Street | Avg_Price | Latitude | Longitude | |
---|---|---|---|---|
171 | ALBERT EMBANKMENT | 2.283829e+06 | 51.493213 | -0.121361 |
186 | ALBION SQUARE | 2.292500e+06 | -41.273758 | 173.289393 |
258 | ALLEYN PARK | 2.283095e+06 | 51.492742 | -0.371967 |
455 | ARGYLE STREET | 2.300000e+06 | 22.319036 | 114.167841 |
499 | ARTESIAN ROAD | 2.462500e+06 | 41.725304 | -88.205529 |
Map of London
address = 'London,UK'
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The Geograpical Co-ordinate of Seattle,Washington are {}, {}.'.format(latitude, longitude))
The Geograpical Co-ordinate of Seattle,Washington are 51.5073219, -0.1276474.
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, street in zip(df_p['Latitude'], df_p['Longitude'], df_p['Street']):
label = '{}'.format(street)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_london)
map_london
Methodology
For the methodology sector, the main goal is to analysis the data and recommend the most valuable venues for the customers based on the surrounding infrastructure and amenities, which retrieved from the Foursquare API.
K-means clustering is used to analyze the venues, given the fact that it is efficient in terms of computational cost and easy to implement when studying larget data set.
Foursquare Credentials
CLIENT_ID = '' # my Foursquare ID
CLIENT_SECRET = '' # my Foursquare Secret
VERSION = '20200604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: '+CLIENT_ID)
print('CLIENT_SECRET: '+CLIENT_SECRET)
Nearby Venues
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
print(name)
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT)
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Street',
'Street Latitude',
'Street Longitude',
'Venue',
'Venue Latitude',
'Venue Longitude',
'Venue Category']
return(nearby_venues)
# get the venues for each street in the dataframe
london_venues = getNearbyVenues(names=df_p['Street'],
latitudes=df_p['Latitude'],
longitudes=df_p['Longitude']
)
print(london_venues.shape)
london_venues.head()
(3688, 7)
Street | Street Latitude | Street Longitude | Venue | Venue Latitude | Venue Longitude | Venue Category | |
---|---|---|---|---|---|---|---|
0 | ALBERT EMBANKMENT | 51.493213 | -0.121361 | The Garden Cafe, Lambeth Palace | 51.495210 | -0.119962 | Café |
1 | ALBERT EMBANKMENT | 51.493213 | -0.121361 | Newport Street Gallery | 51.493466 | -0.117040 | Art Gallery |
2 | ALBERT EMBANKMENT | 51.493213 | -0.121361 | Plaza On The River Club And Residence | 51.491529 | -0.121371 | Hotel |
3 | ALBERT EMBANKMENT | 51.493213 | -0.121361 | Tamesis Dock | 51.491928 | -0.121934 | Bar |
4 | ALBERT EMBANKMENT | 51.493213 | -0.121361 | Park Plaza London Riverbank | 51.491478 | -0.122177 | Hotel |
# get the List of Unique Categories
print('There are {} uniques categories.'.format(len(london_venues['Venue Category'].unique())))
print(london_venues.shape)
london_venues.groupby('Street').count().head()
There are 318 uniques categories.
(3688, 7)
Street Latitude | Street Longitude | Venue | Venue Latitude | Venue Longitude | Venue Category | |
---|---|---|---|---|---|---|
Street | ||||||
ALBERT EMBANKMENT | 53 | 53 | 53 | 53 | 53 | 53 |
ALBION SQUARE | 26 | 26 | 26 | 26 | 26 | 26 |
ALLEYN PARK | 6 | 6 | 6 | 6 | 6 | 6 |
ARGYLE STREET | 87 | 87 | 87 | 87 | 87 | 87 |
ARTESIAN ROAD | 8 | 8 | 8 | 8 | 8 | 8 |
One Hot Encoding of Features
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")
# add street column back to dataframe
london_onehot['Street'] = london_venues['Street']
# move street column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]
london_onehot.head()
Street | ATM | Accessories Store | Adult Boutique | African Restaurant | Airport | Airport Terminal | American Restaurant | Antique Shop | Argentinian Restaurant | ... | Vietnamese Restaurant | Warehouse Store | Weight Loss Center | Whisky Bar | Wine Bar | Wine Shop | Wings Joint | Women's Store | Yoga Studio | Zoo Exhibit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ALBERT EMBANKMENT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | ALBERT EMBANKMENT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | ALBERT EMBANKMENT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | ALBERT EMBANKMENT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | ALBERT EMBANKMENT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 319 columns
london_grouped = london_onehot.groupby('Street').mean().reset_index()
london_grouped.head()
Street | ATM | Accessories Store | Adult Boutique | African Restaurant | Airport | Airport Terminal | American Restaurant | Antique Shop | Argentinian Restaurant | ... | Vietnamese Restaurant | Warehouse Store | Weight Loss Center | Whisky Bar | Wine Bar | Wine Shop | Wings Joint | Women's Store | Yoga Studio | Zoo Exhibit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ALBERT EMBANKMENT | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | ALBION SQUARE | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | ALLEYN PARK | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | ARGYLE STREET | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.011494 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | ARTESIAN ROAD | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 319 columns
# top 5 venues valued most
num_top_venues = 5
for hood in london_grouped['Street']:
print("----"+hood+"----")
temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
temp.columns = ['venue','freq']
temp = temp.iloc[1:]
temp['freq'] = temp['freq'].astype(float)
temp = temp.round({'freq': 2})
print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
print('\n')
----ALBERT EMBANKMENT----
venue freq
0 Restaurant 0.13
1 Hotel 0.11
2 Café 0.11
3 Pub 0.08
4 Pizza Place 0.04
----ALBION SQUARE----
venue freq
0 Café 0.19
1 Pub 0.08
2 Bar 0.08
3 Coffee Shop 0.08
4 Restaurant 0.08
...
----WYATT DRIVE----
venue freq
0 Donut Shop 0.29
1 Mexican Restaurant 0.14
2 Airport Terminal 0.14
3 Discount Store 0.14
4 Thrift / Vintage Store 0.14
Most Common Venues on the Street
# definde the function that returns the most common venue
def most_common_venues(row, num_top_venues):
row_categories = row.iloc[1:]
row_categories_sorted = row_categories.sort_values(ascending=False)
return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
try:
columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns.append('{}th Most Common Venue'.format(ind+1))
# create dataframe
common_venues_sorted = pd.DataFrame(columns=columns)
common_venues_sorted['Street'] = london_grouped['Street']
for ind in np.arange(london_grouped.shape[0]):
common_venues_sorted.iloc[ind, 1:] = most_common_venues(london_grouped.iloc[ind, :], num_top_venues)
print(common_venues_sorted.shape)
common_venues_sorted.head()
(130, 11)
Street | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | ALBERT EMBANKMENT | Restaurant | Café | Hotel | Pub | Park | Art Gallery | Plaza | Pizza Place | Coffee Shop | Breakfast Spot |
1 | ALBION SQUARE | Café | Pub | Bar | Indian Restaurant | Restaurant | Coffee Shop | Park | Seafood Restaurant | Fish & Chips Shop | New American Restaurant |
2 | ALLEYN PARK | Pub | Indian Restaurant | Playground | Lawyer | Farmers Market | English Restaurant | Event Service | Event Space | Exhibit | Fabric Shop |
3 | ARGYLE STREET | Cha Chaan Teng | Noodle House | Dessert Shop | Hotel | Sporting Goods Shop | Hong Kong Restaurant | Vegetarian / Vegan Restaurant | Cantonese Restaurant | Bakery | Toy / Game Store |
4 | ARTESIAN ROAD | Italian Restaurant | Sushi Restaurant | Indian Restaurant | Jewelry Store | Auto Garage | Pizza Place | Pilates Studio | Pharmacy | Event Space | Exhibit |
K-Means Clustering
After inspecting the data we obtained, we then apply k-means clustering approach with 5 clusters to study the data.
1. Create clustering dataframe
k = 5
london_grouped_clustering = df_p.drop('Street', 1)
kmeans = KMeans(n_clusters=k, random_state=0).fit(london_grouped_clustering)
kmeans.labels_
array([3, 3, 3, 3, 1, 4, 1, 2, 1, 4, 4, 4, 2, 0, 0, 4, 0, 4, 4, 3, 2, 3,
1, 3, 2, 0, 3, 3, 1, 1, 4, 4, 0, 4, 1, 3, 4, 0, 1, 4, 3, 4, 1, 2,
3, 2, 4, 0, 1, 4, 2, 0, 1, 0, 3, 1, 4, 1, 4, 4, 3, 1, 0, 3, 0, 4,
0, 4, 3, 0, 0, 4, 4, 0, 4, 1, 1, 0, 0, 2, 3, 1, 0, 3, 2, 3, 3, 4,
2, 0, 3, 3, 1, 0, 1, 2, 4, 0, 4, 2, 4, 3, 0, 2, 0, 3, 3, 3, 0, 2,
4, 3, 0, 4, 2, 4, 3, 0, 4, 2, 4, 0, 1, 3, 0, 2, 2, 0, 0, 1, 0, 0,
4, 0, 3, 4, 4, 4, 3, 1, 3, 0, 3, 4], dtype=int32)
# check if using the correct dataframe
london_grouped_clustering=df_p
london_grouped_clustering.head()
Street | Avg_Price | Latitude | Longitude | Cluster Labels | |
---|---|---|---|---|---|
171 | ALBERT EMBANKMENT | 2.283829e+06 | 51.493213 | -0.121361 | 3 |
186 | ALBION SQUARE | 2.292500e+06 | -41.273758 | 173.289393 | 3 |
258 | ALLEYN PARK | 2.283095e+06 | 51.492742 | -0.371967 | 3 |
455 | ARGYLE STREET | 2.300000e+06 | 22.319036 | 114.167841 | 3 |
499 | ARTESIAN ROAD | 2.462500e+06 | 41.725304 | -88.205529 | 1 |
common_venues_sorted.head()
Cluster Labels | Street | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | ALBERT EMBANKMENT | Restaurant | Café | Hotel | Pub | Park | Art Gallery | Plaza | Pizza Place | Coffee Shop | Breakfast Spot |
1 | 2 | ALBION SQUARE | Café | Pub | Bar | Indian Restaurant | Restaurant | Coffee Shop | Park | Seafood Restaurant | Fish & Chips Shop | New American Restaurant |
2 | 3 | ALLEYN PARK | Pub | Indian Restaurant | Playground | Lawyer | Farmers Market | English Restaurant | Event Service | Event Space | Exhibit | Fabric Shop |
3 | 2 | ARGYLE STREET | Cha Chaan Teng | Noodle House | Dessert Shop | Hotel | Sporting Goods Shop | Hong Kong Restaurant | Vegetarian / Vegan Restaurant | Cantonese Restaurant | Bakery | Toy / Game Store |
4 | 2 | ARTESIAN ROAD | Italian Restaurant | Sushi Restaurant | Indian Restaurant | Jewelry Store | Auto Garage | Pizza Place | Pilates Studio | Pharmacy | Event Space | Exhibit |
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
common_venues_sorted = common_venues_sorted.drop(columns = 'Cluster Labels')
london_grouped_clustering = london_grouped_clustering.join(common_venues_sorted.set_index('Street'), on='Street')
london_grouped_clustering.head()
Street | Avg_Price | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
171 | ALBERT EMBANKMENT | 2.283829e+06 | 51.493213 | -0.121361 | 3 | Restaurant | Café | Hotel | Pub | Park | Art Gallery | Plaza | Pizza Place | Coffee Shop | Breakfast Spot |
186 | ALBION SQUARE | 2.292500e+06 | -41.273758 | 173.289393 | 3 | Café | Pub | Bar | Indian Restaurant | Restaurant | Coffee Shop | Park | Seafood Restaurant | Fish & Chips Shop | New American Restaurant |
258 | ALLEYN PARK | 2.283095e+06 | 51.492742 | -0.371967 | 3 | Pub | Indian Restaurant | Playground | Lawyer | Farmers Market | English Restaurant | Event Service | Event Space | Exhibit | Fabric Shop |
455 | ARGYLE STREET | 2.300000e+06 | 22.319036 | 114.167841 | 3 | Cha Chaan Teng | Noodle House | Dessert Shop | Hotel | Sporting Goods Shop | Hong Kong Restaurant | Vegetarian / Vegan Restaurant | Cantonese Restaurant | Bakery | Toy / Game Store |
499 | ARTESIAN ROAD | 2.462500e+06 | 41.725304 | -88.205529 | 1 | Italian Restaurant | Sushi Restaurant | Indian Restaurant | Jewelry Store | Auto Garage | Pizza Place | Pilates Studio | Pharmacy | Event Space | Exhibit |
2. Create map of clustering
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'],
london_grouped_clustering['Longitude'],
london_grouped_clustering['Street'],
london_grouped_clustering['Cluster Labels']):
label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
folium.CircleMarker(
[lat, lon],
radius=5,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.7).add_to(map_clusters)
map_clusters
Result
df1 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 0,
london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
df2 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 1,
london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
df3 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 2,
london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
df4 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 3,
london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
df5 = london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 4,
london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]]
# Cluster 0
df1.head()
Avg_Price | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
1439 | 2.200000e+06 | Pub | Fish & Chips Shop | Sandwich Place | Performing Arts Venue | Recording Studio | Park | Fish Market | Filipino Restaurant | Fast Food Restaurant | Farmers Market |
1508 | 2.225000e+06 | Pub | Sandwich Place | Discount Store | Hardware Store | Supermarket | Casino | Bar | Bus Station | Gym | River |
2046 | 2.218333e+06 | Grocery Store | Fried Chicken Joint | Bus Stop | Fish & Chips Shop | Fast Food Restaurant | Event Space | Exhibit | Fabric Shop | Falafel Restaurant | Farm |
2540 | 2.213750e+06 | Italian Restaurant | Pub | Café | Japanese Restaurant | Asian Restaurant | English Restaurant | Juice Bar | Cocktail Bar | Grocery Store | Bakery |
3230 | 2.240000e+06 | Chinese Restaurant | Café | Hotel | Metro Station | Food Court | Event Service | Event Space | Exhibit | Fabric Shop | Falafel Restaurant |
# Cluster 1
df2.head()
Avg_Price | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
499 | 2.462500e+06 | Italian Restaurant | Sushi Restaurant | Indian Restaurant | Jewelry Store | Auto Garage | Pizza Place | Pilates Studio | Pharmacy | Event Space | Exhibit |
680 | 2.475000e+06 | Pub | Construction & Landscaping | Bar | Performing Arts Venue | Park | Dance Studio | Deli / Bodega | Event Service | Event Space | Exhibit |
1021 | 2.500000e+06 | Performing Arts Venue | Restaurant | Zoo Exhibit | Fast Food Restaurant | Event Service | Event Space | Exhibit | Fabric Shop | Falafel Restaurant | Farm |
2223 | 2.500000e+06 | Italian Restaurant | Hotel | Light Rail Station | Dessert Shop | Japanese Restaurant | Café | Lebanese Restaurant | Park | Pizza Place | Deli / Bodega |
2611 | 2.470154e+06 | Pub | Lake | Fast Food Restaurant | Event Service | Event Space | Exhibit | Fabric Shop | Falafel Restaurant | Farm | Farmers Market |
# Cluster 2
df3.head()
Avg_Price | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
989 | 2.344600e+06 | Sandwich Place | Sports Club | Auto Garage | Locksmith | Zoo Exhibit | Farmers Market | Event Space | Exhibit | Fabric Shop | Falafel Restaurant |
1326 | 2.376941e+06 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2168 | 2.361750e+06 | Pub | Café | Coffee Shop | Bus Stop | Park | Platform | Thai Restaurant | Italian Restaurant | Bar | Middle Eastern Restaurant |
2395 | 2.323750e+06 | Café | Bar | Italian Restaurant | Pub | Coffee Shop | French Restaurant | Park | Caribbean Restaurant | Pizza Place | Spa |
3912 | 2.338185e+06 | Gym / Fitness Center | Convenience Store | Sports Club | Martial Arts School | Bus Stop | Rugby Pitch | Farm | Event Service | Event Space | Exhibit |
# Cluster 3
df4.head()
Avg_Price | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
171 | 2.283829e+06 | Restaurant | Café | Hotel | Pub | Park | Art Gallery | Plaza | Pizza Place | Coffee Shop | Breakfast Spot |
186 | 2.292500e+06 | Café | Pub | Bar | Indian Restaurant | Restaurant | Coffee Shop | Park | Seafood Restaurant | Fish & Chips Shop | New American Restaurant |
258 | 2.283095e+06 | Pub | Indian Restaurant | Playground | Lawyer | Farmers Market | English Restaurant | Event Service | Event Space | Exhibit | Fabric Shop |
455 | 2.300000e+06 | Cha Chaan Teng | Noodle House | Dessert Shop | Hotel | Sporting Goods Shop | Hong Kong Restaurant | Vegetarian / Vegan Restaurant | Cantonese Restaurant | Bakery | Toy / Game Store |
2125 | 2.305400e+06 | Pub | Ice Cream Shop | Bakery | Yoga Studio | Coffee Shop | Indian Restaurant | Hotel | Hostel | Park | Breakfast Spot |
# Cluster 4
df5.head()
Avg_Price | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
536 | 2425000.0 | Pub | Grocery Store | Bakery | Indian Restaurant | Mediterranean Restaurant | Coffee Shop | Ice Cream Shop | Middle Eastern Restaurant | Moroccan Restaurant | Fish & Chips Shop |
1179 | 2425000.0 | Theater | Coffee Shop | Bookstore | Clothing Store | Beer Bar | Bakery | Dessert Shop | Indian Restaurant | Boutique | Sushi Restaurant |
1180 | 2400000.0 | Chinese Restaurant | Park | Grocery Store | English Restaurant | Event Service | Event Space | Exhibit | Fabric Shop | Falafel Restaurant | Farm |
1220 | 2400000.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1746 | 2400000.0 | Sandwich Place | Lounge | Health & Beauty Service | Park | Convenience Store | Food & Drink Shop | Flea Market | Empanada Restaurant | Food Court | English Restaurant |
Discussion
After studying the dataset, we may gain a better insight across the London house market based on the average street price and the amenities nearby. London Housing Market, though is said to be in a rut at the moment, is still vibrant and of potential.
We start by analysing the result from different angles.
Grouping by Clusters
According to the five clusters produced in the end, we could target housebuyers’ interests when suggesting the best venues. For example, Cluster 0 may target customers looking for various food places and economic life expenses, for they show the most common of restaurants, pub and cafe with a relatively low average price. Instead, customers, a fancy high-end community may make their choices among estates on the streets in Cluster 1, with art museums and exhibitions surrounded.
Future Market Study and Prediction
On the other hand, we could also make our prediction on the future market based on the level of infrastructures and amenities such as schools and hospitals. It is also interesting to notice that there are outstanding venues located at South-West London (Wimbledon, Baham) and North-West London (Kilburn, Hampstead), which may be due to the city expansion. This may favour potential investors into the housing market who want to purchase venues as an investment option.
Further implementations
Further Study can also be brought out based on detailed requirements provided by the housebuyers, such as sorting the venues in each cluster by average price or picking out the estates with the most common venues specified.
Conclusion
Although housing markets in London have witnessed several ups and downs, with volatile house prices in London tending to amplify changes in national house prices recently, there is still great potential lying in the market for investors and house buyers. Recently, as a result of the pandemic, increasing unemployment rate, wage cuts, business failures, and job uncertainty also brought massive impact to the housing market in London. However, it is relatively hard to provide an insight of the market from the patterns of previous cycles for clients, so in this report, we focused on the integration of data science and machine learning technics to assist clients making sensible and efficient choices. From the results provided, we could easily recommend profitable venues according to amenities and infrastructures surrounded.
We first retrieved data on London properties and the relative price from the HM Land Registry (http://landregistry.data.gov.uk/). Amenities and infrastructure data frame are built with the Foursquare API. Merging the two datasets, we were able to have more detailed information about the market and start our study. K-means clustering technique is used to cluster all the data into five different clusters. Finally, we could use the results to better study the London house market and make our choice of data based on clients’ requirement.
This approach is a visualizable and efficient way for housebuyers to gain an insight into the housing market, and the methodology can be applied to other places with proper data provided. It is also worth notice that the due to the robust of the housing market, making a wise decision is essential even with the help of algorithm technics and data analysis which is based on past market performance but not predicting future incidences.
Reference