Finding Federal Organizations: The Challenge of Incomplete Data Sources¶
When building applications that work with federal procurement and assistance data, one of the first challenges you'll encounter is finding the right organization identifiers. The federal government maintains multiple overlapping but incomplete data sources, each with their own strengths and gaps. In this post, we'll explore why SAM's Federal Hierarchy is authoritative but incomplete, how USAspending fills some gaps but misses others, and how Tango's unified approach helps you find organizations reliably.
The Authoritative Source: SAM Federal Hierarchy¶
The Federal Hierarchy from SAM.gov is the official, authoritative source for federal organization structure. It provides a comprehensive tree of departments, agencies, and offices with unique orgKey identifiers and maintains the canonical parent-child relationships.
However, the Federal Hierarchy has a critical limitation: it's missing entire organizations that appear in transaction data.
For example, USAspending's office file contains thousands of organizations that don't exist in Federal Hierarchy. These include:
- Office
AC6091- "W462 USA AERONAUTICAL SVCS AGG" (Department of Defense office that appears in USAspending financial assistance data but is missing from Federal Hierarchy) - Many DOD offices - Hundreds of Department of Defense offices with codes like
W81YDE,W914J4,W90LDHthat appear in USAspending transaction data but aren't in Federal Hierarchy's structure - Subtier agencies - Organizations like "Center for Nutrition Policy and Promotion" (subtier code
12F3) and "Federal Library and Information Center Committee" (subtier code0363) that exist in USAspending's subtier file but may be missing from Federal Hierarchy
When you're working with contract data from FPDS or financial assistance from USAspending, you'll encounter codes like:
- CGAC codes (like
069for Department of Transportation) - FPDS codes (4-digit agency identifiers used in contract transactions)
- Subtier codes (like
12F3for Center for Nutrition Policy and Promotion) - Office codes (like
15JCRMfor the Criminal Division)
The Federal Hierarchy may have the organization's name and structure, but it often lacks these operational codes that appear in actual transaction data.
USAspending: Filling Some Gaps¶
USAspending's database has data that complement the Federal Hierarchy, divided into top tier, subtier, and offices.
USASpending contains data that are missing from Federal Hierarchy. For example, this includes abbreviations like DOT, DHS, USDA, as well as full names and mission statements.
However, USAspending has a critical gap: it doesn't include all FPDS contract data. FPDS uses its own set of organization identifiers that don't always map cleanly to USAspending's structure. When you're working with contract transactions, you'll encounter organizations that exist in neither Federal Hierarchy nor USAspending:
- Legacy FPDS offices - Historical contract offices that were used in FPDS transactions but have since been reorganized or decommissioned. These appear in FPDS contract data with office codes that don't match any organization in Federal Hierarchy or USAspending's reference files.
- Contract-specific organization IDs - FPDS transaction data includes
fpds_org_idvalues that are specific to contract processing workflows. These identifiers may reference organizations that were valid at the time of the contract but no longer exist in current organization reference data. - Historical agency structures - FPDS contains contract transactions from years past that reference agency codes and department IDs that have changed over time. The organization that awarded a contract in 2010 might have a different code structure today, and the old codes may not appear in either Federal Hierarchy or USAspending's current reference files.
These FPDS identifiers appear in millions of contract transactions but aren't present in USAspending's organization files. Tango addresses this by maintaining legacy organization data from historical FPDS sources, ensuring that even old contract transactions can be properly linked to their awarding organizations.
Tango's Unified Approach¶
Tango consolidates all these sources into a single Organization model with a priority-based field provenance system. Here's how it works:
Data Source Priority¶
Tango loads organizations from multiple sources in priority order:
- Federal Hierarchy (Top Priority) - The authoritative structure
- USAspending (Next Priority) - Fills in missing codes and details
- Legacy models (Historical FPDS) - Backfills from historical data
Higher-priority sources won't be overwritten by lower-priority ones, ensuring that Federal Hierarchy's authoritative structure is preserved while USAspending fills in the operational codes.
Finding Organizations in Tango¶
Tango provides several ways to find organizations, each optimized for different use cases:
1. Search by Name or Alias¶
The /api/organizations/ endpoint supports a search query parameter that uses a multi-stage search strategy:
# Search by abbreviation, acronym, or name
GET /api/organizations/?search=FEMA
GET /api/organizations/?search=Department of Transportation
GET /api/organizations/?search=Treasury OIG # Context-aware search
Or programmatically using the search library:
from agencies.lib.search import search_organizations
# Finds organizations by abbreviation, acronym, or name
results = search_organizations("FEMA")
results = search_organizations("Department of Transportation")
results = search_organizations("Treasury OIG") # Context-aware search
The search handles:
- Exact alias matches - Catches abbreviations like "CIO", "OIG", "FEMA"
- Trigram similarity - Handles typos like "FMEA" → "FEMA"
- Full-text search - Finds organizations by keywords in names
- Context-aware queries - "Treasury OIG" finds the OIG within Treasury
2. Filter by Code¶
You can filter organizations by any of the codes they might have:
# Find by CGAC code
GET /api/organizations/?cgac=069
# Find by FPDS code
GET /api/organizations/?fpds_code=2100
# Find by office code
GET /api/organizations/?code=15JCRM
3. Lookup by fh_key¶
The Federal Hierarchy's orgKey (mapped to fh_key in Tango) is used for direct lookups:
Note: Tango uses two identifiers:
key(UUID) - The primary key for the Organization model, stable across reloadsfh_key(BigInteger) - The Federal Hierarchy'sorgKeyidentifier, used for API lookups and cross-referencing with SAM.gov
4. Hierarchy Navigation¶
Each organization includes parent relationships and flattened hierarchy paths:
{
"key": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"fh_key": 123456,
"name": "Federal Emergency Management Agency",
"short_name": "FEMA",
"parent_fh_key": 789012,
"l1_name": "Department of Homeland Security",
"l1_short_name": "DHS",
"full_parent_path_name": "Department of Homeland Security > Federal Emergency Management Agency"
}
Field Provenance¶
Every organization tracks which source provided each field and when it was last updated:
{
"field_provenance": {
"name": {
"source": "federal_hierarchy",
"modified_at": "2024-12-01T10:00:00Z"
},
"cgac": {
"source": "usaspending",
"modified_at": "2024-12-01T10:00:00Z"
},
"fpds_code": {
"source": "usaspending",
"modified_at": "2024-12-01T10:00:00Z"
},
"code": {
"source": "legacy",
"modified_at": "2024-12-01T10:00:00Z"
}
}
}
This transparency helps us ensure the reliability of each field and make informed decisions about which identifiers to use.
Practical Examples¶
Example 1: Finding an Organization from a Contract Transaction¶
When processing an FPDS contract transaction, you might see:
agencyID:2100departmentID:097
In Tango, you can find the organization using either identifier:
# By FPDS code
GET /api/organizations/?fpds_code=2100
# Or search by name if you know it
GET /api/organizations/?search=Department of the Army
Example 2: Finding an Office from USAspending Data¶
USAspending financial assistance data might reference:
awarding_sub_agency_code:1501awarding_office_code:15JCRM
You can find the office:
The response will include the full hierarchy, so you can see it's part of the Department of Justice (CGAC 015).
Example 3: Context-Aware Search¶
If you know you're looking for "Treasury OIG" but aren't sure of the exact code:
Tango's contextual search will find the OIG (Office of Inspector General) within the Treasury Department, even if there are multiple OIGs across different departments.
Best Practices¶
- Use
key(UUID) for API references - The UUIDkeyis the primary identifier for storing references to organizations in your application - Use
fh_keyfor Federal Hierarchy lookups - When cross-referencing with SAM.gov or other federal data sources that use the Federal Hierarchy, usefh_key - Use code lookups for transaction matching - When matching transactions, use the specific code type (CGAC, FPDS code, office code) that appears in your data
- Leverage search for user-facing features - The
searchquery parameter handles abbreviations, typos, and context better than exact code matching - Check field_provenance for data quality - Understand which source provided each field to assess reliability
Conclusion¶
Federal organization data is fragmented across multiple sources, each with its own strengths and gaps. SAM's Federal Hierarchy provides the authoritative structure but lacks operational codes. USAspending fills in many codes but doesn't include FPDS contract identifiers. Tango unifies these sources with a priority-based system that preserves authoritative data while filling in the gaps, giving you a single, reliable way to find and reference federal organizations.
Whether you're matching contract transactions, processing financial assistance awards, or building user-facing search features, Tango's unified Organization model and flexible search capabilities help you find the right organization identifiers, regardless of which source your data comes from.