You're previewing an early version of Semantic Insight — share your feedback · report an issue

Methodology

We compare what employers are asking for against validated occupational standards — task by task, employer by employer. This page explains the approach, the pipeline, and how we measure whether the results are trustworthy.

Canonical Comparison, Not Keyword Counting

Most labor market analysis extracts keywords from postings and counts frequencies. That tells you what's mentioned — but not what's missing, not what's new, and not whether any of it represents real occupational change. We take a different approach: every duty is compared against canonical task statements from the O*NET occupational database — the product of decades of federally funded job analysis research where subject matter experts who actually do the work generate task descriptions, which are then rated for relevance and importance by large samples of job incumbents. These aren't keyword lists. They're the validated, consensus description of what an occupation involves. Comparing against that baseline reveals three things keyword counting can't: which tasks employers confirm (evidence, not assumption), which tasks are absent from postings despite being core to the occupation (you can't measure what's missing without a baseline), and which requirements are emerging — things employers are asking for that O*NET doesn't capture yet.

Keyword Analysis

"What words appear in job postings?" Treats all mentions equally. No reference point for what should be there. Can only see what's present. Output: frequency counts.

Canonical Comparison

"Given what this occupation requires, what are employers emphasizing, omitting, or adding?" Reads for meaning against a validated baseline. Measures change, not just frequency. Output: occupational change signals.

How the Analysis Works

1
Posting Collection & Screening
Real postings from your region via Google Jobs. Staffing reposts, confidential listings, and thin postings filtered before analysis.
SerpAPI Quality screen
2
Duty Extraction
Each posting split into sentences. An LLM classifier separates actual duties from noise — requirements, benefits, boilerplate. Typically 70–85% filtered out.
Local parser LLM classifier
3
Semantic Task Matching
Each duty encoded by a model pre-trained on 3.2M job posting sentences, then compared against O*NET task statements. Captures meaning, not keywords — "educate patients about disease prevention" matches the health education task.
JobBERT O*NET 30.1
4
AI Judge Verification
A second AI independently confirms or rejects each match. Eliminates false positives where sentences sound similar but describe different work. Results split into two streams:
Matched Tasks

Confirmed against O*NET. Counted by employer frequency.

Emerging Requirements

Unmatched duties clustered by meaning. 2+ employers = signal.

LLM judge
5
Aggregation & Batch Assessment
Results combined across all postings. Batch checked for representativeness before results are citable.
Saturation check Batch verdict
6
Evidence & Program Alignment
One analysis feeds six report types. When faculty confirm coverage of each task, reports include alignment scores and gap identification.
6 reports Alignment

When to Trust the Results

Before you cite findings in a compliance report or use them for curriculum decisions, the system checks whether your sample represents regional employer demand. Three criteria, one verdict.

10+
Quality Postings
10+
Unique Employers
0 new
Tasks in Last 3 Postings
Representative — all three met. Findings are defensible.
Almost — one missing. System tells you exactly what to add.
Not yet — two or more missing. Keep collecting.

Task saturation adapts code saturation from qualitative research — the point at which no new themes emerge from additional data. Systematic reviews place this at 9–17 interviews for homogeneous populations. Job postings within a single SOC code are highly homogeneous, so saturation typically occurs within 8–12 postings.

Hennink & Kaiser (2022). "Sample sizes for saturation in qualitative research." Quality & Quantity, 56, 1895–1907. · Guest, Bunce & Johnson (2006). "How many interviews are enough?" Field Methods, 18(1), 59–82.

When the Crosswalk Falls Short

The federal CIP-SOC Crosswalk maps some programs to residual "All Other" occupation codes (SOC codes ending in 9, such as 11-9199 or 21-1019). These are legitimate BLS categories, but O*NET does not maintain task statements, skills, or work activities for them — which means the standard task-to-posting comparison isn't possible.

This isn't rare. Codes like Managers, All Other; Counselors, All Other; and Community and Social Service Specialists, All Other are common crosswalk destinations for mainstream community college programs.

When we detect this, we do two things:

1. Surface related occupations. We identify specific occupations in the same SOC minor group that have full O*NET profiles, rank them by regional employment, and present them as alternatives. These are clearly labeled as related suggestions — the user makes the final selection.

2. Extract employer requirements directly. If an "All Other" code is analyzed, technologies, certifications, and employer requirements are still extracted from the postings themselves. This posting-driven evidence can document regional demand even when the canonical occupational profile doesn't exist.

No federal guidance addresses how institutions should handle task analysis for residual occupation codes. This is a known interoperability gap between the CIP-SOC Crosswalk and the O*NET database.

Ready to see what employers in your region actually emphasize?

Explore Your Labor Market →