Module 2 — Classification & Spatial Scale

PAF 516 | Community Analytics

M2 Overview & Learning Materials

Classification & Spatial Scale

Module Overview and Objectives

In Module 1, you built a composite economic hardship index and mapped it with a basic choropleth. This module asks two deceptively important questions: does the story your map tells change depending on how you classify the data? And does it change depending on the geographic scale you use? The answer to both is yes — and the differences can be dramatic. A map using equal intervals may hide inequality that a quantile map reveals. A county-level map may mask the very neighborhood-level variation that matters most for targeted policy.

Beyond classification and scale, this module introduces bivariate choropleth maps that display two variables simultaneously, interactive web maps with mapgl that let users explore data on demand, and cartographic design principles that separate professional-quality maps from default output. You will also confront the Modifiable Areal Unit Problem (MAUP) — the fundamental challenge that spatial analysis results depend on the size and shape of the geographic units used. You will move from “making a map” to designing a visual argument.

After completing this module, you will be able to:

Apply and compare four classification methods: quantile, Jenks natural breaks, equal interval, and standard deviation
Explain how classification method choice affects map interpretation and policy conclusions
Create a bivariate choropleth map using the biscale package to visualize two variables simultaneously
Build an interactive mapgl map with tooltips, hover effects, and GPU-rendered vector tiles
Apply cartographic design principles: visual hierarchy, figure-ground, appropriate color ramps
Use ggspatial to add scale bars, north arrows, and annotation to static maps
Recognize and account for the Modifiable Areal Unit Problem (MAUP) — understand how geographic scale affects analytical conclusions and policy recommendations

Lecture

The lecture notes provide detailed slide-by-slide annotations, with the main points expanded upon in the sections below.

Download the lecture notes: Variable Classification & Spatial Scale — Lecture Notes (PDF)

Section 1: Classification Methods and Why They Matter

Every choropleth map requires a decision: how do we divide a continuous variable into discrete color bins? This is not a neutral technical choice — it is an editorial decision that shapes what patterns viewers perceive.

Four Common Classification Methods

Method	How It Works	Best For	Watch Out For
Equal Interval	Divides the data range into bins of equal width	Data with uniform distributions	Skewed data concentrates most observations in one or two bins
Quantile	Places an equal number of observations in each bin	Ensuring visual variation across the map	Can place very different values in the same bin; may exaggerate small differences
Jenks Natural Breaks	Minimizes within-class variance and maximizes between-class variance	Data with natural clusters or gaps	Breaks are data-specific — not comparable across maps with different data
Standard Deviation	Bins centered on the mean, each one standard deviation wide	Showing how far observations deviate from the average	Assumes roughly normal distribution; outliers dominate the visual

The classInt package in R provides functions for all four methods. The key function is classIntervals(), which accepts a style argument: "equal", "quantile", "jenks", or "sd".

Critical insight: When you present a choropleth map in a policy setting, you should be able to justify your classification choice. “I used the default” is not a justification.

Section 2: Bivariate Choropleth Mapping

A standard choropleth shows one variable. But policy questions often involve relationships between two variables: Is poverty concentrated where educational attainment is low? Do areas with high hardship also have high unemployment?

A bivariate choropleth maps two variables simultaneously using a two-dimensional color legend. Each variable is classified into categories (typically 3x3), and the combinations produce a grid of 9 colors. For example, one axis might represent economic hardship index values and the other might represent the percentage of residents without a high school diploma.

The biscale Package

The biscale package streamlines bivariate mapping in R:

bi_class() — classifies two variables into a combined bivariate class
bi_scale_fill() — applies the appropriate 2D color palette to ggplot
bi_legend() — generates the 3x3 legend

The legend is created separately and composed with the map using cowplot or patchwork.

Design consideration: Bivariate choropleths are powerful but cognitively demanding. Always include a clear legend and limit the classification to 3x3 (not 4x4 or 5x5) to keep the map interpretable.

Section 3: Interactive Maps with mapgl

Static maps are essential for reports and publications, but interactive maps let users explore data at their own pace — zooming in on neighborhoods, hovering over polygons for details, and experiencing smooth GPU-rendered performance.

The mapgl package renders interactive maps via MapLibre GL JS. Spatial data is passed directly as GeoJSON — no external tile server required. Key features include:

Basemap styles: Choose from OpenFreeMap styles (positron, bright, dark) via openfreemap_style()
GeoJSON source: Pass sf objects directly with add_source(type = "geojson", data = sf_object)
Fill layers: Color polygons by variable values using add_fill_layer() with interpolate()
Tooltips and hover: Display data values on hover using tooltip and hover_options
GPU acceleration: Handles block-group-level data (thousands of polygons) smoothly

Interactive maps produced by mapgl are HTML widgets — they can be embedded directly in Quarto documents and dashboards, making them ideal for the policy briefs and final project you will produce later in the course.

Section 4: Cartographic Design Principles

Good cartography is not decoration — it is communication. A well-designed map directs attention to the data story; a poorly designed map obscures it.

Core Principles

Visual hierarchy: The most important information (the thematic data) should be the most visually prominent. Basemap labels and borders should recede
Figure-ground: The study area should stand out from the surrounding context. Use a subtle background or outline to frame the area of interest
Color choice: Use sequential palettes for ordinal data, diverging palettes for data with a meaningful midpoint, and qualitative palettes for categorical data. Colorblind-safe palettes (e.g., viridis) are essential for accessibility
Map elements: Every thematic map should include a title, legend, scale bar, north arrow, and data source citation

The ggspatial package adds professional cartographic elements to ggplot maps: annotation_scale() for scale bars and annotation_north_arrow() for orientation.

Section 5: The Modifiable Areal Unit Problem (MAUP)

The Modifiable Areal Unit Problem (MAUP) is one of the most fundamental challenges in spatial analysis. It refers to the fact that the results of any analysis using aggregated spatial data depend on how the boundaries of those spatial units are drawn. The same underlying phenomenon can look dramatically different depending on whether you map it at the county, tract, or block group level.

Why MAUP Matters for Policy

Consider a county-level map of economic hardship across the United States. A county like Maricopa County, Arizona — home to Phoenix and over 4 million people — receives a single hardship score. But that single score masks enormous internal variation. Some block groups within Maricopa have economic hardship index values well above the national average, while others rank among the lowest in the state. A state official relying solely on the county map would see Maricopa as a moderate-hardship county and might conclude that it does not need targeted intervention — missing the neighborhoods within the county where hardship is severe.

This is the core danger of MAUP: aggregation smooths away the very variation that matters most for targeted policy. A county that appears “average” may contain both affluent suburbs and deeply distressed neighborhoods. The larger the unit of analysis, the more heterogeneity gets hidden.

Implications for Practice

Use the most granular geography available when you expect important within-unit variation. Census tracts and block groups reveal neighborhood-level patterns that county-level data cannot
Compare across scales to understand what aggregation hides. If the pattern changes substantially when you move from counties to tracts to block groups, MAUP is at work
Be explicit about scale choices in reports and policy briefs. Always state the geographic unit of analysis and acknowledge what within-unit variation it may conceal
Interactive maps help because users can zoom from broad regional patterns down to neighborhood detail, seeing MAUP in action rather than being locked into a single scale

Readings

Brewer, C. A., & Pickle, L. (2002). Evaluation of methods for classifying epidemiological data on choropleth maps in series. Annals of the Association of American Geographers, 92(4), 662–681. — The definitive empirical comparison of classification methods for choropleth maps. Demonstrates that quantile and natural breaks methods outperform equal interval for map-reading accuracy.
Brewer, C. A. ColorBrewer 2.0. — Interactive tool for choosing cartographically sound color schemes. Provides sequential, diverging, and qualitative palettes tested for colorblind safety, print friendliness, and screen legibility.
Openshaw, S. & Taylor, P.J. (1979). A million or so correlation coefficients: three experiments on the modifiable areal unit problem. Statistical Applications in the Spatial Sciences, pp. 127–144. — The paper that coined the term “MAUP.” Using Iowa county data, Openshaw and Taylor showed that aggregating 99 counties into different 6-zone configurations produced correlations ranging from –0.97 to +0.99 for the same underlying variables.
Openshaw, S. (1984). The Modifiable Areal Unit Problem. CATMOG No. 38. Geo Books. — The most-cited treatment of MAUP (~5,200 citations). Distinguishes the scale effect (results change when units are aggregated to larger sizes) from the zoning effect (results change when units of the same size are reconfigured). Essential reading for anyone working with aggregated spatial data.
Fotheringham, A.S. & Wong, D.W.S. (1991). The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A, 23(7), 1025–1044. doi:10.1068/a231025 — Extended MAUP from bivariate correlations into multiple regression and logit models, showing that parameter estimates varied by a factor of up to nine across geographic units. Demonstrated that MAUP is not just a mapping problem but a fundamental analytical challenge.
Nelson, J.K. & Brewer, C.A. (2017). Evaluating data stability in aggregation structures across spatial scales. Cartography and Geographic Information Science, 44(1), 35–50. doi:10.1080/15230406.2015.1093431 — The modern cartographic treatment of MAUP, showing that tract-level data preserves spatial relationships that county-level analysis destroys. Directly relevant to the multi-scale comparisons in Lab 2.
Stevens, J. (2015). Bivariate Choropleth Maps: A How-to Guide. — Practical introduction to bivariate choropleth design with emphasis on color selection and legend construction. The design principles in this guide directly inform the biscale package workflow used in Lab 2.
Walker, K. (2023). Analyzing US Census Data: Methods, Maps, and Models in R. CRC Press. Chapter 6: Mapping Census Data with R. — Covers choropleth mapping with R including classification methods, color palettes, and interactive mapping. Directly relevant to the tidycensus + ggplot2 workflow used throughout this course.
Axis Maps. Cartography Guide. — Concise, practical reference covering color theory, classification, labeling, and layout for thematic maps. Excellent quick reference for cartographic design decisions.

R Package Documentation

classInt package documentation — Classification interval algorithms
biscale package documentation — Bivariate choropleth tools
mapgl package documentation — Interactive GPU-rendered maps
ggspatial package documentation — Scale bars, north arrows, and cartographic annotation

Lab 2

The Lab 2 materials are on the course lab site.

Lab 2 Tutorial — Download the tutorial file, knit it to see the complete analysis, then run chunk by chunk to understand each step.
Lab 2 Assignment — Download the assignment file, rename it with your last name, complete the three questions, and submit to Canvas.

Yellowdig Discussion

Find a published choropleth map from a news article, government report, nonprofit dashboard, or academic paper. Share the map (or a link to it) in your post, then write a critical cartographic review. Your review should address how the mapmaker’s design choices — classification method, color scheme, scale, and layout — shape the story the map tells, and whether different choices would lead to different policy interpretations. Connect your analysis to at least one concept from this module’s readings (e.g., Brewer & Pickle’s findings on classification accuracy, Stevens’ bivariate design principles, or the Axis Maps guidance on visual hierarchy). Consider: if this map were presented to a city council or included in a federal grant application, would its design strengthen or undermine the argument?

Key Terms

Term	Definition
Classification Method	An algorithm for dividing continuous data into discrete bins for choropleth mapping
Quantile Classification	Places an equal number of observations in each class; maximizes visual variation
Jenks Natural Breaks	Optimizes class boundaries to minimize within-class variance (Goodness of Variance Fit)
Equal Interval	Divides the data range into classes of equal width
Standard Deviation	Centers bins on the mean; each bin spans one standard deviation
Bivariate Choropleth	A map that encodes two variables simultaneously using a two-dimensional color grid
Small Multiples	A series of similar maps or charts displayed together to facilitate comparison
Visual Hierarchy	The arrangement of map elements so the most important data draws the most attention
Sequential Palette	A color ramp from light to dark representing low-to-high values
Diverging Palette	A color ramp with two hues diverging from a neutral midpoint (e.g., blue-white-red)
Modifiable Areal Unit Problem (MAUP)	The sensitivity of spatial analysis results to the size and shape of the geographic units used; aggregation to larger units can mask important within-unit variation