IPEDS & College Scorecard Data Validation Study

Exhaustive source-file comparison of all higher education data
Date:
February 14, 2026
Scope:
141,999 data points · 5,141 institutions
Result:
100.0% verified
VERIFIED
141,999 data points compared against federal source files with 0 discrepancies. Both IPEDS program completions (121,120 records) and College Scorecard institutional outcomes (20,879 data points) match their respective source files exactly.
141,999
Total Data Points Verified
5,141
Institutions Checked
0
Discrepancies Found

Data Sources Validated

Dataset Source File Data Points Match Rate Status
IPEDS Completions C2024_A.csv 121,120 100.0%
College Scorecard Most-Recent-Cohorts-Institution.csv 20,879 100.0%

Part 1 — IPEDS Program Completions

Every program-level completions record in the Semantic Insight backend was compared against the NCES Completions survey file C2024_A — the authoritative federal source for postsecondary program completions data. This file contains 307,707 rows covering awards conferred from July 1, 2023 through June 30, 2024, across 6,429 institutions. Downloaded from nces.ed.gov.

NCES C2024_A307,707 rows
Comparison EngineUNITID × CIP code
121,120 Verified0 discrepancies
121,120
Program Records Compared
4,918
Institutions Verified
1,534
Unique CIP Codes

Methodology

1

IPEDS Completions — Exhaustive Source Comparison

The NCES source file reports at the (UNITID, CIPCODE, AWLEVEL, MAJORNUM) grain. Completions were aggregated across award levels and major number, then every row in ipeds_programs_6digit.json (keyed by UNITID + 6-digit CIP code) was matched against the NCES source. Four fields per record were compared: UNITID, CIPCODE, CTOTALT (total completions), and AWLEVEL (award levels present).

2

Results — 121,120 of 121,120 Records Verified

Of the 121,120 backend records: 106,076 (87.6%) were exact matches against NCES grand totals. The remaining 15,044 (12.4%) differ because the backend uses MAJORNUM=1 only (first-major students), excluding double-major second-major counts. In every case, the backend value exactly equals the NCES MAJORNUM=1 subtotal. Zero unexplained discrepancies.

Completions Verification Breakdown

CategoryRecordsPercentageStatus
Exact match (all completions)106,07687.6%
Match on MAJORNUM=1 (first-major only)15,04412.4%
Unexplained discrepancies00.0%
Total121,120100.0%

Part 2 — College Scorecard

Five institutional outcome fields were compared against the U.S. Department of Education College Scorecard "Most Recent Institution-Level Data" file (6,429 institutions, last updated November 17, 2025). Downloaded from collegescorecard.ed.gov/data.

College Scorecard6,429 institutions
Field-level Match5 fields × UNITID
20,879 Verified0 discrepancies

Field Mapping & Results

Backend FieldScorecard ColumnDescriptionVerifiedStatus
completion_rateC150_4 / C150_L4Graduation rate (150% normal time)4,313
median_debtGRAD_DEBT_MDNMedian debt at graduation3,824
employment_rateCOUNT_WNE_P10 /
COUNT_NWNE_P10
Employment rate, 10yr post-entry4,004
enrollmentUGDSUndergraduate enrollment4,734
median_earningsMD_EARN_WNE_P10Median earnings, 10yr post-entry4,004
Total data points verified20,879
Employment rate methodology: The College Scorecard does not publish a single "employment rate" field. The backend calculates it from COUNT_WNE_P10 (working, not enrolled, 10yr post-entry) and COUNT_NWNE_P10 (not working, not enrolled): employment_rate = WNE ÷ (WNE + NWNE) × 100. Verified to produce an exact match for all 4,004 institutions where both source values are available.

Sample Verification Detail — Old Dominion University

Both datasets were validated exhaustively by automated comparison. The following is a spot-check of Old Dominion University (UNITID 232982), also verified manually against IPEDS Institution Profile and College Scorecard on February 14, 2026.

Institutional Outcomes (Scorecard)

FieldBackendScorecard SourceResult
Graduation Rate44.4%44.35% (C150_4 = 0.4435)
Median Debt$24,000$24,000
Employment Rate86.8%86.8% (6,004 ÷ 6,916)
Enrollment17,52117,521
Median Earnings (10yr)$54,914$54,914

Program Completions (IPEDS) — Top 15

CIPProgramBackendNCES M1NCES TotalResult
52.0101Business/Commerce, General1,0031,0031,005
11.0101Computer & Info Sciences, General824824824
11.0103Information Technology480480480
42.0101Psychology, General406406409
11.0802Data Modeling/Warehousing346346346
26.0101Biology/Biological Sciences, General329329335
43.0107Criminal Justice/Police Science328328331
13.1001Special Education, General285285285
13.0301Curriculum and Instruction261261261
51.3801Registered Nursing260260260
22.0101Law237237237
52.0201Business Admin & Management223223224
44.0701Social Work183183183
51.2208Community Health & Preventive Med157157157
45.0601Economics, General155155157

All 180 ODU programs verified. Backend = Semantic Insight value. NCES M1 = MAJORNUM=1 subtotal (should match). NCES Total = grand total including double majors.

How to Verify

Any data point in a Semantic Insight institution analysis can be independently verified:

Program completions: Download the Completions dataset from nces.ed.gov/ipeds/datacenter/DataFiles.aspx → Year 2024 → Survey "Completions" → Data file "C2024_A". Open the CSV and filter by UNITID (shown on the institution page as "Verify on IPEDS") and CIPCODE.

Institutional outcomes (graduation rate, debt, employment, earnings): Download "Most Recent Institution-Level Data" from collegescorecard.ed.gov/data. Filter by UNITID. Graduation rate is in column C150_4 (multiply by 100), median debt in GRAD_DEBT_MDN, and earnings in MD_EARN_WNE_P10. Employment rate is COUNT_WNE_P10 ÷ (COUNT_WNE_P10 + COUNT_NWNE_P10) × 100.

Notes & Known Limitations

MAJORNUM=1 filter. The backend counts first-major students only. For programs where students commonly double-major, the backend total will be slightly lower than the NCES grand total. This affects 15,044 of 121,120 records (12.4%); the median difference is 1 completer. This is standard methodology matching the College Scorecard.

Award-level aggregation. Completions are summed across all award levels (certificates, bachelor's, master's, doctoral) into a single per-program total. The individual levels present are retained in metadata.

Derived metrics. The roi_ratio field (3,683 institutions) and employment_rate field (4,004 institutions) are calculated from verified source values using deterministic formulas. Accurate inputs guarantee accurate outputs.

Null/suppressed values. The College Scorecard publishes "NA" or "PrivacySuppressed" when sample sizes are too small. The backend correctly stores these as null. Of the 5,141 backend institutions, 150 are system offices or very new branch campuses not independently present in the Scorecard file.

Data Currency & Refresh Schedule

DatasetCurrent VersionPublication CycleNext Expected Release
IPEDS Completions2023-24 (C2024_A)Annual (fall)~Fall 2026 (2024-25 data)
College ScorecardNov 2025 release~Twice per year~Spring 2026

When NCES or the Department of Education publishes updated data, we re-run this validation process against the new source files before deploying updates.