Module 4 — Spatial Autocorrelation

PAF 516 | Community Analytics

M4 Overview & Learning Materials

Spatial Autocorrelation & Hot Spot Analysis

Module Overview and Objectives

In Modules 1–3, you built and mapped an economic hardship index. You probably noticed clusters — groups of nearby block groups with similarly high or low values. But are those clusters statistically significant, or could they have occurred by chance? This module gives you the tools to answer that question.

Spatial autocorrelation is the tendency for nearby locations to have similar values — a formalization of Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.” This module teaches you to measure spatial autocorrelation globally (across the entire study area) and locally (at each individual location), enabling you to identify statistically significant hot spots, cold spots, and spatial outliers.


After completing this module, you will be able to:

  • Explain Tobler’s First Law of Geography and its relationship to spatial autocorrelation
  • Construct spatial weights matrices using queen contiguity, rook contiguity, and distance-based approaches
  • Calculate and interpret Global Moran’s I as a measure of overall spatial clustering
  • Calculate and interpret Local Moran’s I (LISA) to identify local clusters and outliers
  • Classify locations as hot spots (HH), cold spots (LL), or spatial outliers (HL, LH)
  • Create LISA cluster maps with appropriate symbology
  • Evaluate statistical significance using permutation-based pseudo p-values

Lecture

The lecture notes cover Tobler’s First Law, spatial weights matrices, Global Moran’s I, Local Moran’s I (LISA), and cluster map interpretation.

Download the lecture notes: Spatial Autocorrelation — Lecture Notes (PDF)

Section 1: Tobler’s First Law and Spatial Autocorrelation

Most standard statistical methods assume that observations are independent. But spatial data almost always violates this assumption — poverty in one census tract is correlated with poverty in neighboring tracts. This is not a nuisance to be corrected; it is a substantive phenomenon that reveals how social and economic processes operate across space.

Three Types of Spatial Patterns

  • Positive spatial autocorrelation (clustered): Similar values tend to be near each other. High-poverty tracts cluster together; affluent tracts cluster together. This is the most common pattern in social data.
  • Negative spatial autocorrelation (dispersed): Dissimilar values tend to be near each other (a checkerboard pattern). Rare in practice for social variables.
  • Spatial randomness: No relationship between location and value. The null hypothesis against which we test.

Why it matters for policy: If hardship is spatially clustered, then place-based interventions (targeting specific neighborhoods) make sense. If hardship is spatially random, then people-based interventions (targeting individuals regardless of location) may be more efficient.

Section 2: Spatial Weights Matrices

To measure spatial autocorrelation, we first need to define “neighbor” — which locations are considered near each other? This definition is encoded in a spatial weights matrix (W), where each cell wij represents the spatial relationship between locations i and j.

Common Neighbor Definitions

Type Definition R Function
Queen contiguity Polygons that share any boundary point (edge or vertex) are neighbors poly2nb(sf_obj, queen = TRUE)
Rook contiguity Polygons that share an edge (not just a point) are neighbors poly2nb(sf_obj, queen = FALSE)
Distance-based All locations within a specified distance are neighbors dnearneigh(coords, d1, d2)
K-nearest neighbors The k closest locations are neighbors knearneigh(coords, k)

After defining neighbors, you convert to a weights list object using nb2listw(). The row-standardized version (style = “W”) is most common — each neighbor’s weight sums to 1, so the spatial lag of a variable is the mean of its neighbors’ values.

Choice matters: The neighbor definition affects your results. Queen contiguity is the standard default for polygon data, but always check that every observation has at least one neighbor (island polygons cause errors).

Section 3: Global Moran’s I

Global Moran’s I is a single summary statistic that measures the degree of spatial autocorrelation across the entire study area. It answers: “Is there a statistically significant spatial pattern overall?”

Interpretation

  • I > 0: Positive spatial autocorrelation — similar values cluster together
  • I ≈ 0: Spatial randomness (technically, the expected value under the null is -1/(n-1), close to 0 for large n)
  • I < 0: Negative spatial autocorrelation — dissimilar values are adjacent

In R, the spdep package computes Global Moran’s I with moran.test(). The function returns the I statistic, expected value, variance, and a p-value. You can also use moran.mc() for a permutation-based test that does not rely on normality assumptions.

Limitation: Global Moran’s I tells you that clustering exists, but not where. A study area could have both hot spots and cold spots that produce the same global statistic. This is why we need local indicators.

Section 4: Local Moran’s I (LISA) and Cluster Maps

Local Indicators of Spatial Association (LISA) decompose the global statistic into a contribution from each individual location. Each location receives its own Local Moran’s I value and significance level, which classifies it into one of four categories:

Category Interpretation Typical Map Color
High-High (HH) A high-value location surrounded by high-value neighbors — a hot spot Red
Low-Low (LL) A low-value location surrounded by low-value neighbors — a cold spot Blue
High-Low (HL) A high-value location surrounded by low-value neighbors — a spatial outlier Light red / pink
Low-High (LH) A low-value location surrounded by high-value neighbors — a spatial outlier Light blue
Not significant No statistically significant local spatial pattern Gray

In R, use localmoran() from the spdep package. The output includes the local I value, expected value, variance, z-score, and p-value for each observation. You then classify each observation into the four quadrants of the Moran scatterplot based on the sign of the standardized value and the sign of its spatial lag.

The Moran Scatterplot

A Moran scatterplot places the standardized variable on the x-axis and its spatial lag (average of neighbors) on the y-axis. The four quadrants correspond directly to the HH, LL, HL, and LH categories. The slope of the best-fit line through the scatterplot equals Global Moran’s I.

Multiple Testing Caution

When testing hundreds of locations simultaneously, some will appear significant by chance alone. Consider applying a Bonferroni correction or using a stricter significance threshold (e.g., p < 0.01 instead of p < 0.05) to reduce false positives.

Readings

  • Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(sup1), 234–240. — The origin of the “First Law of Geography”: everything is related to everything else, but near things are more related than distant things. The conceptual foundation for all spatial autocorrelation analysis.

  • Anselin, L. (1995). Local indicators of spatial association — LISA. Geographical Analysis, 27(2), 93–115. — The seminal paper introducing LISA statistics, which decompose global spatial autocorrelation into location-specific contributions. The theoretical basis for hot spot / cold spot classification used in Lab 4.

  • Bivand, R. S., & Wong, D. W. S. (2018). Comparing implementations of global and local indicators of spatial association. TEST, 27(3), 716–748. doi:10.1007/s11749-018-0599-x — A rigorous comparison of Moran’s I and LISA implementations across software platforms including R’s spdep, GeoDa, and ArcGIS.

  • Anselin, L. (2020). Global spatial autocorrelation and Local spatial autocorrelation. GeoDa Workbook. — Interactive tutorials covering Global Moran’s I and LISA with worked examples. Excellent companion to the lecture.

  • Walker, K. (2023). Analyzing US Census Data: Methods, Maps, and Models in R. CRC Press. Available online: https://walker-data.com/census-r/ — Comprehensive guide to working with Census data in R using tidycensus. Covers spatial analysis workflows directly relevant to the data pipelines used throughout this course.

R Package Documentation

  • spdep package documentation — Spatial dependence: weighting schemes, statistics, and models. Reference for poly2nb(), nb2listw(), moran.test(), localmoran(), and all other spatial statistics functions used in this module.
  • spdep: Creating Neighbours (vignette) — Detailed guide to neighbor definitions including queen/rook contiguity, distance-based, and k-nearest approaches.
  • tidycensus package — Census data with geometry

Lab 4

The Lab 4 materials are on the course lab site.

  • Lab 4 Tutorial — Download the tutorial file, knit it to see the complete analysis, then run chunk by chunk to understand each step.
  • Lab 4 Assignment — Download the assignment file, rename it with your last name, complete the three questions, and submit to Canvas.

Yellowdig Discussion

Spatial autocorrelation tells us where hardship clusters, but not why. The jump from statistical pattern to policy explanation requires careful reasoning about causal mechanisms, historical context, and structural forces.

Discussion prompt: Consider the LISA cluster map you will produce in Lab 4, along with this module’s readings on spatial dependence and Tobler’s First Law.

  • Hot spots (HH clusters) identify neighborhoods where high hardship is spatially concentrated — surrounded by other high-hardship areas. Cold spots (LL clusters) identify the opposite. What structural, institutional, or historical processes might produce the spatial clustering of hardship you observe?
  • How should the distinction between hot spots, cold spots, and spatial outliers inform the design of policy interventions — should resources be allocated differently to each type?
  • What might a spatial outlier (e.g., a high-hardship block group surrounded by low-hardship neighbors) reveal about the limitations of place-based policy?

Key Terms

Term Definition
Spatial Autocorrelation The correlation of a variable with itself through space — nearby locations tend to have similar values
Tobler’s First Law “Everything is related to everything else, but near things are more related than distant things”
Spatial Weights Matrix (W) A matrix encoding the neighbor relationships between all spatial observations
Queen Contiguity Two polygons are neighbors if they share any boundary point (edge or corner)
Global Moran’s I A single statistic measuring overall spatial autocorrelation across the study area
Local Moran’s I (LISA) A statistic computed for each location measuring its contribution to the global spatial pattern
Hot Spot (HH) A high-value location surrounded by high-value neighbors — a cluster of high values
Cold Spot (LL) A low-value location surrounded by low-value neighbors — a cluster of low values
Spatial Outlier (HL/LH) A location whose value differs markedly from its neighbors
Moran Scatterplot A plot of standardized values vs. their spatial lag; quadrants correspond to HH, LL, HL, LH