Module 2 — Classification & Spatial Scale
PAF 516 | Community Analytics
M2 Overview & Learning Materials
Classification & Spatial Scale
Module Overview and Objectives
In Module 1, you built a composite economic hardship index and mapped it with a basic choropleth. This module asks two deceptively important questions: does the story your map tells change depending on how you classify the data? And does it change depending on the geographic scale you use? The answer to both is yes — and the differences can be dramatic. A map using equal intervals may hide inequality that a quantile map reveals. A county-level map may mask the very neighborhood-level variation that matters most for targeted policy.
Beyond classification and scale, this module introduces bivariate choropleth maps that display two variables simultaneously, interactive web maps with mapgl that let users explore data on demand, and cartographic design principles that separate professional-quality maps from default output. You will also confront the Modifiable Areal Unit Problem (MAUP) — the fundamental challenge that spatial analysis results depend on the size and shape of the geographic units used. You will move from “making a map” to designing a visual argument.
After completing this module, you will be able to:
- Apply and compare four classification methods: quantile, Jenks natural breaks, equal interval, and standard deviation
- Explain how classification method choice affects map interpretation and policy conclusions
- Create a bivariate choropleth map using the
biscalepackage to visualize two variables simultaneously - Build an interactive mapgl map with tooltips, hover effects, and GPU-rendered vector tiles
- Apply cartographic design principles: visual hierarchy, figure-ground, appropriate color ramps
- Use
ggspatialto add scale bars, north arrows, and annotation to static maps - Recognize and account for the Modifiable Areal Unit Problem (MAUP) — understand how geographic scale affects analytical conclusions and policy recommendations
Lecture
The lecture notes provide detailed slide-by-slide annotations, with the main points expanded upon in the sections below.
Download the lecture notes: Variable Classification & Spatial Scale — Lecture Notes (PDF)
Section 1: Classification Methods and Why They Matter
Every choropleth map requires a decision: how do we divide a continuous variable into discrete color bins? This is not a neutral technical choice — it is an editorial decision that shapes what patterns viewers perceive.
Four Common Classification Methods
| Method | How It Works | Best For | Watch Out For |
|---|---|---|---|
| Equal Interval | Divides the data range into bins of equal width | Data with uniform distributions | Skewed data concentrates most observations in one or two bins |
| Quantile | Places an equal number of observations in each bin | Ensuring visual variation across the map | Can place very different values in the same bin; may exaggerate small differences |
| Jenks Natural Breaks | Minimizes within-class variance and maximizes between-class variance | Data with natural clusters or gaps | Breaks are data-specific — not comparable across maps with different data |
| Standard Deviation | Bins centered on the mean, each one standard deviation wide | Showing how far observations deviate from the average | Assumes roughly normal distribution; outliers dominate the visual |
The classInt package in R provides functions for all four methods. The key function is classIntervals(), which accepts a style argument: "equal", "quantile", "jenks", or "sd".
Critical insight: When you present a choropleth map in a policy setting, you should be able to justify your classification choice. “I used the default” is not a justification.
Section 2: Bivariate Choropleth Mapping
A standard choropleth shows one variable. But policy questions often involve relationships between two variables: Is poverty concentrated where educational attainment is low? Do areas with high hardship also have high unemployment?
A bivariate choropleth maps two variables simultaneously using a two-dimensional color legend. Each variable is classified into categories (typically 3x3), and the combinations produce a grid of 9 colors. For example, one axis might represent economic hardship index values and the other might represent the percentage of residents without a high school diploma.
The biscale Package
The biscale package streamlines bivariate mapping in R:
bi_class()— classifies two variables into a combined bivariate classbi_scale_fill()— applies the appropriate 2D color palette to ggplotbi_legend()— generates the 3x3 legend
The legend is created separately and composed with the map using cowplot or patchwork.
Design consideration: Bivariate choropleths are powerful but cognitively demanding. Always include a clear legend and limit the classification to 3x3 (not 4x4 or 5x5) to keep the map interpretable.
Section 3: Interactive Maps with mapgl
Static maps are essential for reports and publications, but interactive maps let users explore data at their own pace — zooming in on neighborhoods, hovering over polygons for details, and experiencing smooth GPU-rendered performance.
The mapgl package renders interactive maps via MapLibre GL JS. Spatial data is passed directly as GeoJSON — no external tile server required. Key features include:
- Basemap styles: Choose from OpenFreeMap styles (positron, bright, dark) via
openfreemap_style() - GeoJSON source: Pass sf objects directly with
add_source(type = "geojson", data = sf_object) - Fill layers: Color polygons by variable values using
add_fill_layer()withinterpolate() - Tooltips and hover: Display data values on hover using
tooltipandhover_options - GPU acceleration: Handles block-group-level data (thousands of polygons) smoothly
Interactive maps produced by mapgl are HTML widgets — they can be embedded directly in Quarto documents and dashboards, making them ideal for the policy briefs and final project you will produce later in the course.
Section 4: Cartographic Design Principles
Good cartography is not decoration — it is communication. A well-designed map directs attention to the data story; a poorly designed map obscures it.
Core Principles
- Visual hierarchy: The most important information (the thematic data) should be the most visually prominent. Basemap labels and borders should recede
- Figure-ground: The study area should stand out from the surrounding context. Use a subtle background or outline to frame the area of interest
- Color choice: Use sequential palettes for ordinal data, diverging palettes for data with a meaningful midpoint, and qualitative palettes for categorical data. Colorblind-safe palettes (e.g., viridis) are essential for accessibility
- Map elements: Every thematic map should include a title, legend, scale bar, north arrow, and data source citation
The ggspatial package adds professional cartographic elements to ggplot maps: annotation_scale() for scale bars and annotation_north_arrow() for orientation.
Section 5: The Modifiable Areal Unit Problem (MAUP)
The Modifiable Areal Unit Problem (MAUP) is one of the most fundamental challenges in spatial analysis. It refers to the fact that the results of any analysis using aggregated spatial data depend on how the boundaries of those spatial units are drawn. The same underlying phenomenon can look dramatically different depending on whether you map it at the county, tract, or block group level.
Why MAUP Matters for Policy
Consider a county-level map of economic hardship across the United States. A county like Maricopa County, Arizona — home to Phoenix and over 4 million people — receives a single hardship score. But that single score masks enormous internal variation. Some block groups within Maricopa have economic hardship index values well above the national average, while others rank among the lowest in the state. A state official relying solely on the county map would see Maricopa as a moderate-hardship county and might conclude that it does not need targeted intervention — missing the neighborhoods within the county where hardship is severe.
This is the core danger of MAUP: aggregation smooths away the very variation that matters most for targeted policy. A county that appears “average” may contain both affluent suburbs and deeply distressed neighborhoods. The larger the unit of analysis, the more heterogeneity gets hidden.
Implications for Practice
- Use the most granular geography available when you expect important within-unit variation. Census tracts and block groups reveal neighborhood-level patterns that county-level data cannot
- Compare across scales to understand what aggregation hides. If the pattern changes substantially when you move from counties to tracts to block groups, MAUP is at work
- Be explicit about scale choices in reports and policy briefs. Always state the geographic unit of analysis and acknowledge what within-unit variation it may conceal
- Interactive maps help because users can zoom from broad regional patterns down to neighborhood detail, seeing MAUP in action rather than being locked into a single scale
Readings
Brewer, C. A., & Pickle, L. (2002). Evaluation of methods for classifying epidemiological data on choropleth maps in series. Annals of the Association of American Geographers, 92(4), 662–681. — The definitive empirical comparison of classification methods for choropleth maps. Demonstrates that quantile and natural breaks methods outperform equal interval for map-reading accuracy.
Brewer, C. A. ColorBrewer 2.0. — Interactive tool for choosing cartographically sound color schemes. Provides sequential, diverging, and qualitative palettes tested for colorblind safety, print friendliness, and screen legibility.
Openshaw, S. & Taylor, P.J. (1979). A million or so correlation coefficients: three experiments on the modifiable areal unit problem. Statistical Applications in the Spatial Sciences, pp. 127–144. — The paper that coined the term “MAUP.” Using Iowa county data, Openshaw and Taylor showed that aggregating 99 counties into different 6-zone configurations produced correlations ranging from –0.97 to +0.99 for the same underlying variables.
Openshaw, S. (1984). The Modifiable Areal Unit Problem. CATMOG No. 38. Geo Books. — The most-cited treatment of MAUP (~5,200 citations). Distinguishes the scale effect (results change when units are aggregated to larger sizes) from the zoning effect (results change when units of the same size are reconfigured). Essential reading for anyone working with aggregated spatial data.
Fotheringham, A.S. & Wong, D.W.S. (1991). The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A, 23(7), 1025–1044. doi:10.1068/a231025 — Extended MAUP from bivariate correlations into multiple regression and logit models, showing that parameter estimates varied by a factor of up to nine across geographic units. Demonstrated that MAUP is not just a mapping problem but a fundamental analytical challenge.
Nelson, J.K. & Brewer, C.A. (2017). Evaluating data stability in aggregation structures across spatial scales. Cartography and Geographic Information Science, 44(1), 35–50. doi:10.1080/15230406.2015.1093431 — The modern cartographic treatment of MAUP, showing that tract-level data preserves spatial relationships that county-level analysis destroys. Directly relevant to the multi-scale comparisons in Lab 2.
Stevens, J. (2015). Bivariate Choropleth Maps: A How-to Guide. — Practical introduction to bivariate choropleth design with emphasis on color selection and legend construction. The design principles in this guide directly inform the
biscalepackage workflow used in Lab 2.Walker, K. (2023). Analyzing US Census Data: Methods, Maps, and Models in R. CRC Press. Chapter 6: Mapping Census Data with R. — Covers choropleth mapping with R including classification methods, color palettes, and interactive mapping. Directly relevant to the tidycensus + ggplot2 workflow used throughout this course.
Axis Maps. Cartography Guide. — Concise, practical reference covering color theory, classification, labeling, and layout for thematic maps. Excellent quick reference for cartographic design decisions.
R Package Documentation
- classInt package documentation — Classification interval algorithms
- biscale package documentation — Bivariate choropleth tools
- mapgl package documentation — Interactive GPU-rendered maps
- ggspatial package documentation — Scale bars, north arrows, and cartographic annotation
Lab 2
The Lab 2 materials are on the course lab site.
- Lab 2 Tutorial — Download the tutorial file, knit it to see the complete analysis, then run chunk by chunk to understand each step.
- Lab 2 Assignment — Download the assignment file, rename it with your last name, complete the three questions, and submit to Canvas.
Yellowdig Discussion
Find a published choropleth map from a news article, government report, nonprofit dashboard, or academic paper. Share the map (or a link to it) in your post, then write a critical cartographic review. Your review should address how the mapmaker’s design choices — classification method, color scheme, scale, and layout — shape the story the map tells, and whether different choices would lead to different policy interpretations. Connect your analysis to at least one concept from this module’s readings (e.g., Brewer & Pickle’s findings on classification accuracy, Stevens’ bivariate design principles, or the Axis Maps guidance on visual hierarchy). Consider: if this map were presented to a city council or included in a federal grant application, would its design strengthen or undermine the argument?
Key Terms
| Term | Definition |
|---|---|
| Classification Method | An algorithm for dividing continuous data into discrete bins for choropleth mapping |
| Quantile Classification | Places an equal number of observations in each class; maximizes visual variation |
| Jenks Natural Breaks | Optimizes class boundaries to minimize within-class variance (Goodness of Variance Fit) |
| Equal Interval | Divides the data range into classes of equal width |
| Standard Deviation | Centers bins on the mean; each bin spans one standard deviation |
| Bivariate Choropleth | A map that encodes two variables simultaneously using a two-dimensional color grid |
| Small Multiples | A series of similar maps or charts displayed together to facilitate comparison |
| Visual Hierarchy | The arrangement of map elements so the most important data draws the most attention |
| Sequential Palette | A color ramp from light to dark representing low-to-high values |
| Diverging Palette | A color ramp with two hues diverging from a neutral midpoint (e.g., blue-white-red) |
| Modifiable Areal Unit Problem (MAUP) | The sensitivity of spatial analysis results to the size and shape of the geographic units used; aggregation to larger units can mask important within-unit variation |