How We Build Our Data

The systems behind 2,616 profiles, 182 field observations, and every recommendation on this site.

What We're Building

HortGuide is a structured knowledge base, not a collection of articles. Every plant, disease, pest, abiotic disorder, and beneficial organism has a profile built from a defined schema. The schema dictates what fields exist, what data types they accept, and what sources are required. Articles interpret that data for Western Washington conditions, but the profiles themselves are region-agnostic facts that could serve any climate zone.

This matters because it means the data is queryable, comparable, and auditable. When a profile says a plant is susceptible to fire blight, that relationship is stored as structured data, not buried in a paragraph. When a GDD threshold changes because of new field observations, the change propagates to every page that references it.

1,910

Plant profiles

390

Disease profiles

254

Pest profiles

Abiotic disorders

Beneficial organisms

107

Published guides

Where the Data Comes From

The knowledge library that feeds these profiles draws from institutional databases, extension publications, research literature, and original field data. Every source is registered, and every fact in a profile traces back to at least one of them.

University extension databases

WSU HortSense (976 fact sheets), OSU Landscape Plants (2,027 entries), PNW Plant Disease Management Handbook, PNW Insect Management Handbook, UC IPM pest notes. These provide the backbone of pest/disease associations, cultivar susceptibility, and regional management recommendations.

Federal databases

USDA Plants Database (93,156 entries nationally; 15,306 for Washington via NRCS) for taxonomy, hardiness zones, and soil preferences. USA National Phenology Network (NPN) for citizen-science phenological observations, licensed under CC BY 4.0.

Research literature

GDD models from Herms 2004 (OSU Secrest Arboretum), the UMD IPMnet Pest Predictive Calendar (Gill & Klick), NC State, Cornell, UC Davis, and RHS publications. ISA Arborist News archive (101 issues, 2003-2024) for arboriculture science.

Weather data

Open-Meteo historical and forecast APIs. Daily temperature, precipitation, soil temperature, and derived metrics for 7 stations across the Puget Sound lowlands, with records from 2020 to present.

Original field observations

182 phenological observations collected in the Puget Sound lowlands since January 2026, each with date, GPS, GDD₃₂, and photographic documentation. These are the ground-truth data that validate and refine everything else. See the observation log.

Source Attribution

Every fact in a profile came from somewhere. We disclose every source, even when the contribution was small. This is how HortGuide operates as an interpretation layer: we synthesize authoritative sources, add regional field data, and are transparent about which is which.

Every piece of data has one of three origin types, and we label them accordingly:

Sourced

Data from a published source: extension databases, handbooks, research papers, taxonomic databases. Cited by name with a link to the original where available.

Observed

Original field data collected by HortGuide: phenological observations, GDD measurements, photographic documentation. Always includes the city, date, and station context.

Interpreted

Regional synthesis, cross-source reconciliation, or professional judgment. When we adapt out-of-region research for Western Washington conditions, or reconcile conflicting sources, we label the result as interpretation and cite the underlying sources.

When sources disagree, we show the disagreement rather than silently picking a winner. A disease profile might note that one handbook lists four host species while another lists seven. The reader sees both, with citations, and can judge for themselves.

Data Maturity Tiers

Not all profiles are equal. Some were bulk-imported from databases and haven't been touched since. Others have been cross-referenced against multiple sources, expert-reviewed, and validated with field observations. Rather than hiding incomplete profiles or pretending they're all the same quality, we label each one honestly.

Tier 1: Baseline

Bulk-imported from an authoritative database (USDA, OSU, WSU). Accurate identity and basic characteristics, but no human review, no regional notes, no field observations. A library card catalog entry.

Tier 2: Structured

Cross-referenced and synthesized from multiple sources. Data has been organized, conflicts identified, and the profile reads coherently. An informed reader could use this for general reference.

Tier 3: Expert-Reviewed

Reviewed by a credentialed arborist. Data verified against professional knowledge, regional context added, management recommendations checked for local applicability. This tier is set manually after human review and cannot be earned by algorithm.

Tier 4: Field-Verified

Expert-reviewed plus validated with original field observations from the Puget Sound lowlands. GDD thresholds confirmed against local phenological data. Photo-documented. The highest confidence level, earned through seasons of observation.

Tier 1 to Tier 2 promotion is automatic when a profile crosses quality thresholds (multi-source data, 40%+ field completeness). Tiers 3 and 4 require manual review. No algorithm can substitute for "I checked this in the field."

Growing Degree Day Methodology

Growing degree days (GDD) measure the accumulated heat that drives plant and insect development. Instead of calendar dates, which shift year to year with weather, GDD gives a reliable way to predict when a plant will bloom, when a pest will emerge, or when a disease risk window opens.

Each day, we calculate how much the average temperature exceeds a base threshold. Those daily values accumulate from January 1. When the total reaches a species-specific threshold, that phenological event is expected. A lilac that blooms at 450 GDD₃₂ will bloom at 450 GDD₃₂ whether that falls in early April (warm year) or late April (cool year).

Why base 32°F?

HortGuide uses GDD base 32°F (GDD₃₂) as the standard for all display. Base 32 captures the slow heat accumulation during Puget Sound's mild winter months, when many woody plants are already responding to temperature. The traditional base 50°F misses this early-season signal. For full details, see our GDD explainer guide.

How we derive GDD thresholds

Our thresholds come from three complementary data streams, each with different strengths:

Multi-year regional calibration

For each species with a known bloom date in this region, we look up the cumulative GDD₃₂ on that date across six years of Puget Sound weather data (2020-2025). This gives a local median, range, and variance. It's our most directly relevant data, but depends on accurate bloom-date estimates for each species.

NPN citizen science observations

The USA National Phenology Network collects phenological observations from trained volunteers across the country. We filter for Washington and Oregon observations, match them to our weather station network, and compute station-local GDD₃₂ thresholds. NPN data provides ground-truth validation: real people recording real bloom dates at real locations.

Published research

For pests and diseases especially, we draw on published GDD models from university extension programs. These are expert-calibrated but often from different climates, so we convert them to GDD₃₂ and flag the regional gap.

When multiple sources exist for the same species, we cross-validate. If NPN observations diverge significantly from the regional calibration, we flag it, because that divergence usually means our bloom-date estimate needs adjustment rather than that the GDD model is wrong.

GDD Confidence Tiers

Every GDD threshold on this site carries a confidence tier. This is our honest assessment of how much you should trust the number for timing decisions in the Puget Sound lowlands.

Local estimate (14 plants)

Multi-year regional calibration validated by NPN observations (within 20% divergence). This threshold has been checked against real bloom observations in Western Washington. Timing should be reliable within the range shown.

Regional estimate (107 plants)

Multi-year regional calibration based on 6 years of weather data, but without independent NPN validation. The threshold is derived from regional bloom-date estimates and local weather, but hasn't been confirmed against ground-truth observations. Use with moderate confidence; the range may be wider than shown.

Extrapolated (5 plants)

Regional calibration with significant divergence from NPN observations (over 50%), or threshold converted from out-of-region research without local validation. Treat as a rough guide. Actual timing in your area may differ substantially.

Currently 126 of 245 plants with GDD data have been assigned confidence tiers. Tiers are recalculated annually as new field observations and weather data accumulate.

The 7-Station Weather Network

Phenology varies across the Puget Sound lowlands. Seattle's urban heat island runs warmer than Kent's Green River valley. Bellingham is 2-3 weeks behind on GDD accumulation. Sequim, in the Olympic rain shadow, has an entirely different moisture regime.

To capture these differences, we track weather data from 7 stations spanning the region: Kent, Seattle, Tacoma, Olympia, Bellingham, Sequim, and Issaquah. Each station has daily records from 2020 to the present, fetched daily via the Open-Meteo historical and forecast APIs. The comparison between stations is the product: it lets you adjust timing recommendations to your specific location.

For plant species with NPN observations near multiple stations, we compute station-local GDD₃₂ thresholds by cross-referencing observation dates against each station's own weather history. This means the bloom threshold for red flowering currant might be different for Bellingham than for Kent, reflecting real microclimate differences rather than a one-size-fits-all number.

How it works daily

Every morning at 6 AM Pacific, an automated pipeline fetches the previous day's weather for all 7 stations, commits it to the data archive, and triggers a site rebuild. The rebuild pulls a live 16-day forecast from Open-Meteo, recalculates GDD accumulations, and updates every profile page with current conditions. The weather dashboard shows real-time station comparisons, GDD accumulation curves, and soil temperature trends.

Field Observations

Every phenological observation recorded on this site feeds directly back into the data. Since January 2026, we've logged 182 observations across the Puget Sound lowlands, each photographed and tied to a specific GDD₃₂ value at the nearest weather station.

The workflow is photo-first: EXIF data provides the date and GPS coordinates. From there, the nearest weather station is identified, the station's GDD₃₂ for that date is looked up, and the observation is logged with full metadata. If a profile for that species doesn't exist yet, one is created from a template. The observation's GDD₃₂ value is written into the profile's phenological calendar, so every observation immediately improves the data that drives the site's timing predictions.

Over time, this creates a feedback loop. A species might start as "Extrapolated" with a threshold borrowed from Ohio research. After local observations, it moves to "Regional estimate." With multiple years of confirmed observations, it earns "Local estimate." The system is designed to get more accurate over time, not just more populated.

Browse the full observation log at /observations/.

The Improvement Loop

HortGuide is designed to get better with every piece of content we publish. When we write a guide about fire blight, the research doesn't just produce an article. It generates structured patches for every plant, disease, and pest profile that the guide references, adds new terms to the glossary, and updates the phenological task calendar. The profiles improve, which makes the next guide's research faster and more accurate, which produces better patches. The loop compounds.

Automated scoring tracks profile completeness across the entire knowledge base. A provenance ledger records every field-level change and its source. A stale-guide detector flags published guides whose underlying profile data has changed significantly since publication. An audit system runs quality checks across all profiles, surfacing empty required fields, missing sources, and profiles that have never been guide-touched.

The result is a knowledge graph that self-corrects. Every guide, every observation, every enrichment session leaves the system measurably better than it was before.

Known Limitations

GDD models are useful approximations, not perfect predictors. Microclimate variation means that even within a single zip code, south-facing slopes, urban heat islands, and cold air drainage pockets can shift timing by a week or more. Our station network captures regional differences but can't account for your specific site.

Photoperiod and chilling requirements interact with temperature in ways GDD alone doesn't capture. Some species need a minimum number of cold hours before responding to warmth. We note these cofactors where known, but many species haven't been studied in detail.

Most profiles in the knowledge base are at Tier 1 or Tier 2 maturity. They contain accurate data from authoritative sources, but they haven't been individually reviewed by a human or verified with local field observations. We're transparent about this because it matters for the confidence you should place in any given recommendation.

NPN citizen science data carries inherent observer variability. Observers may record events a few days after they actually occur, slightly inflating GDD values. We use IQR filtering to remove outliers, but some noise remains.

Questions about our methodology? Get in touch. See current weather data, browse plant profiles, or read field observations to see these systems in action.