Module 4 — Spatial Autocorrelation
PAF 516 | Community Analytics
M4 Overview & Learning Materials
Spatial Autocorrelation & Hot Spot Analysis
Module Overview and Objectives
In Modules 1–3, you built and mapped an economic hardship index. You probably noticed clusters — groups of nearby block groups with similarly high or low values. But are those clusters statistically significant, or could they have occurred by chance? This module gives you the tools to answer that question.
Spatial autocorrelation is the tendency for nearby locations to have similar values — a formalization of Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.” This module teaches you to measure spatial autocorrelation globally (across the entire study area) and locally (at each individual location), enabling you to identify statistically significant hot spots, cold spots, and spatial outliers.
After completing this module, you will be able to:
- Explain Tobler’s First Law of Geography and its relationship to spatial autocorrelation
- Construct spatial weights matrices using queen contiguity, rook contiguity, and distance-based approaches
- Calculate and interpret Global Moran’s I as a measure of overall spatial clustering
- Calculate and interpret Local Moran’s I (LISA) to identify local clusters and outliers
- Classify locations as hot spots (HH), cold spots (LL), or spatial outliers (HL, LH)
- Create LISA cluster maps with appropriate symbology
- Evaluate statistical significance using permutation-based pseudo p-values
Lecture
The lecture notes cover Tobler’s First Law, spatial weights matrices, Global Moran’s I, Local Moran’s I (LISA), and cluster map interpretation.
Download the lecture notes: Spatial Autocorrelation — Lecture Notes (PDF)
Section 1: Tobler’s First Law and Spatial Autocorrelation
Most standard statistical methods assume that observations are independent. But spatial data almost always violates this assumption — poverty in one census tract is correlated with poverty in neighboring tracts. This is not a nuisance to be corrected; it is a substantive phenomenon that reveals how social and economic processes operate across space.
Three Types of Spatial Patterns
- Positive spatial autocorrelation (clustered): Similar values tend to be near each other. High-poverty tracts cluster together; affluent tracts cluster together. This is the most common pattern in social data.
- Negative spatial autocorrelation (dispersed): Dissimilar values tend to be near each other (a checkerboard pattern). Rare in practice for social variables.
- Spatial randomness: No relationship between location and value. The null hypothesis against which we test.
Why it matters for policy: If hardship is spatially clustered, then place-based interventions (targeting specific neighborhoods) make sense. If hardship is spatially random, then people-based interventions (targeting individuals regardless of location) may be more efficient.
Section 2: Spatial Weights Matrices
To measure spatial autocorrelation, we first need to define “neighbor” — which locations are considered near each other? This definition is encoded in a spatial weights matrix (W), where each cell wij represents the spatial relationship between locations i and j.
Common Neighbor Definitions
| Type | Definition | R Function |
|---|---|---|
| Queen contiguity | Polygons that share any boundary point (edge or vertex) are neighbors | poly2nb(sf_obj, queen = TRUE) |
| Rook contiguity | Polygons that share an edge (not just a point) are neighbors | poly2nb(sf_obj, queen = FALSE) |
| Distance-based | All locations within a specified distance are neighbors | dnearneigh(coords, d1, d2) |
| K-nearest neighbors | The k closest locations are neighbors | knearneigh(coords, k) |
After defining neighbors, you convert to a weights list object using nb2listw(). The row-standardized version (style = “W”) is most common — each neighbor’s weight sums to 1, so the spatial lag of a variable is the mean of its neighbors’ values.
Choice matters: The neighbor definition affects your results. Queen contiguity is the standard default for polygon data, but always check that every observation has at least one neighbor (island polygons cause errors).
Section 3: Global Moran’s I
Global Moran’s I is a single summary statistic that measures the degree of spatial autocorrelation across the entire study area. It answers: “Is there a statistically significant spatial pattern overall?”
Interpretation
- I > 0: Positive spatial autocorrelation — similar values cluster together
- I ≈ 0: Spatial randomness (technically, the expected value under the null is -1/(n-1), close to 0 for large n)
- I < 0: Negative spatial autocorrelation — dissimilar values are adjacent
In R, the spdep package computes Global Moran’s I with moran.test(). The function returns the I statistic, expected value, variance, and a p-value. You can also use moran.mc() for a permutation-based test that does not rely on normality assumptions.
Limitation: Global Moran’s I tells you that clustering exists, but not where. A study area could have both hot spots and cold spots that produce the same global statistic. This is why we need local indicators.
Section 4: Local Moran’s I (LISA) and Cluster Maps
Local Indicators of Spatial Association (LISA) decompose the global statistic into a contribution from each individual location. Each location receives its own Local Moran’s I value and significance level, which classifies it into one of four categories:
| Category | Interpretation | Typical Map Color |
|---|---|---|
| High-High (HH) | A high-value location surrounded by high-value neighbors — a hot spot | Red |
| Low-Low (LL) | A low-value location surrounded by low-value neighbors — a cold spot | Blue |
| High-Low (HL) | A high-value location surrounded by low-value neighbors — a spatial outlier | Light red / pink |
| Low-High (LH) | A low-value location surrounded by high-value neighbors — a spatial outlier | Light blue |
| Not significant | No statistically significant local spatial pattern | Gray |
In R, use localmoran() from the spdep package. The output includes the local I value, expected value, variance, z-score, and p-value for each observation. You then classify each observation into the four quadrants of the Moran scatterplot based on the sign of the standardized value and the sign of its spatial lag.
The Moran Scatterplot
A Moran scatterplot places the standardized variable on the x-axis and its spatial lag (average of neighbors) on the y-axis. The four quadrants correspond directly to the HH, LL, HL, and LH categories. The slope of the best-fit line through the scatterplot equals Global Moran’s I.
Multiple Testing Caution
When testing hundreds of locations simultaneously, some will appear significant by chance alone. Consider applying a Bonferroni correction or using a stricter significance threshold (e.g., p < 0.01 instead of p < 0.05) to reduce false positives.
Readings
Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(sup1), 234–240. — The origin of the “First Law of Geography”: everything is related to everything else, but near things are more related than distant things. The conceptual foundation for all spatial autocorrelation analysis.
Anselin, L. (1995). Local indicators of spatial association — LISA. Geographical Analysis, 27(2), 93–115. — The seminal paper introducing LISA statistics, which decompose global spatial autocorrelation into location-specific contributions. The theoretical basis for hot spot / cold spot classification used in Lab 4.
Bivand, R. S., & Wong, D. W. S. (2018). Comparing implementations of global and local indicators of spatial association. TEST, 27(3), 716–748. doi:10.1007/s11749-018-0599-x — A rigorous comparison of Moran’s I and LISA implementations across software platforms including R’s
spdep, GeoDa, and ArcGIS.Anselin, L. (2020). Global spatial autocorrelation and Local spatial autocorrelation. GeoDa Workbook. — Interactive tutorials covering Global Moran’s I and LISA with worked examples. Excellent companion to the lecture.
Walker, K. (2023). Analyzing US Census Data: Methods, Maps, and Models in R. CRC Press. Available online: https://walker-data.com/census-r/ — Comprehensive guide to working with Census data in R using
tidycensus. Covers spatial analysis workflows directly relevant to the data pipelines used throughout this course.
R Package Documentation
- spdep package documentation — Spatial dependence: weighting schemes, statistics, and models. Reference for
poly2nb(),nb2listw(),moran.test(),localmoran(), and all other spatial statistics functions used in this module. - spdep: Creating Neighbours (vignette) — Detailed guide to neighbor definitions including queen/rook contiguity, distance-based, and k-nearest approaches.
- tidycensus package — Census data with geometry
Lab 4
The Lab 4 materials are on the course lab site.
- Lab 4 Tutorial — Download the tutorial file, knit it to see the complete analysis, then run chunk by chunk to understand each step.
- Lab 4 Assignment — Download the assignment file, rename it with your last name, complete the three questions, and submit to Canvas.
Yellowdig Discussion
Spatial autocorrelation tells us where hardship clusters, but not why. The jump from statistical pattern to policy explanation requires careful reasoning about causal mechanisms, historical context, and structural forces.
Discussion prompt: Consider the LISA cluster map you will produce in Lab 4, along with this module’s readings on spatial dependence and Tobler’s First Law.
- Hot spots (HH clusters) identify neighborhoods where high hardship is spatially concentrated — surrounded by other high-hardship areas. Cold spots (LL clusters) identify the opposite. What structural, institutional, or historical processes might produce the spatial clustering of hardship you observe?
- How should the distinction between hot spots, cold spots, and spatial outliers inform the design of policy interventions — should resources be allocated differently to each type?
- What might a spatial outlier (e.g., a high-hardship block group surrounded by low-hardship neighbors) reveal about the limitations of place-based policy?
Key Terms
| Term | Definition |
|---|---|
| Spatial Autocorrelation | The correlation of a variable with itself through space — nearby locations tend to have similar values |
| Tobler’s First Law | “Everything is related to everything else, but near things are more related than distant things” |
| Spatial Weights Matrix (W) | A matrix encoding the neighbor relationships between all spatial observations |
| Queen Contiguity | Two polygons are neighbors if they share any boundary point (edge or corner) |
| Global Moran’s I | A single statistic measuring overall spatial autocorrelation across the study area |
| Local Moran’s I (LISA) | A statistic computed for each location measuring its contribution to the global spatial pattern |
| Hot Spot (HH) | A high-value location surrounded by high-value neighbors — a cluster of high values |
| Cold Spot (LL) | A low-value location surrounded by low-value neighbors — a cluster of low values |
| Spatial Outlier (HL/LH) | A location whose value differs markedly from its neighbors |
| Moran Scatterplot | A plot of standardized values vs. their spatial lag; quadrants correspond to HH, LL, HL, LH |