Skip to content

Finding Federal Organizations: The Challenge of Incomplete Data Sources

When building applications that work with federal procurement and assistance data, one of the first challenges you'll encounter is finding the right organization identifiers. The federal government maintains multiple overlapping but incomplete data sources, each with their own strengths and gaps. In this post, we'll explore why SAM's Federal Hierarchy is authoritative but incomplete, how USAspending fills some gaps but misses others, and how Tango's unified approach helps you find organizations reliably.

The Authoritative Source: SAM Federal Hierarchy

The Federal Hierarchy from SAM.gov is the official, authoritative source for federal organization structure. It provides a comprehensive tree of departments, agencies, and offices with unique orgKey identifiers and maintains the canonical parent-child relationships.

However, the Federal Hierarchy has a critical limitation: it's missing entire organizations that appear in transaction data.

For example, USAspending's office file contains thousands of organizations that don't exist in Federal Hierarchy. These include:

  • Office AC6091 - "W462 USA AERONAUTICAL SVCS AGG" (Department of Defense office that appears in USAspending financial assistance data but is missing from Federal Hierarchy)
  • Many DOD offices - Hundreds of Department of Defense offices with codes like W81YDE, W914J4, W90LDH that appear in USAspending transaction data but aren't in Federal Hierarchy's structure
  • Subtier agencies - Organizations like "Center for Nutrition Policy and Promotion" (subtier code 12F3) and "Federal Library and Information Center Committee" (subtier code 0363) that exist in USAspending's subtier file but may be missing from Federal Hierarchy

When you're working with contract data from FPDS or financial assistance from USAspending, you'll encounter codes like:

  • CGAC codes (like 069 for Department of Transportation)
  • FPDS codes (4-digit agency identifiers used in contract transactions)
  • Subtier codes (like 12F3 for Center for Nutrition Policy and Promotion)
  • Office codes (like 15JCRM for the Criminal Division)

The Federal Hierarchy may have the organization's name and structure, but it often lacks these operational codes that appear in actual transaction data.

USAspending: Filling Some Gaps

USAspending's database has data that complement the Federal Hierarchy, divided into top tier, subtier, and offices.

USASpending contains data that are missing from Federal Hierarchy. For example, this includes abbreviations like DOT, DHS, USDA, as well as full names and mission statements.

However, USAspending has a critical gap: it doesn't include all FPDS contract data. FPDS uses its own set of organization identifiers that don't always map cleanly to USAspending's structure. When you're working with contract transactions, you'll encounter organizations that exist in neither Federal Hierarchy nor USAspending:

  • Legacy FPDS offices - Historical contract offices that were used in FPDS transactions but have since been reorganized or decommissioned. These appear in FPDS contract data with office codes that don't match any organization in Federal Hierarchy or USAspending's reference files.
  • Contract-specific organization IDs - FPDS transaction data includes fpds_org_id values that are specific to contract processing workflows. These identifiers may reference organizations that were valid at the time of the contract but no longer exist in current organization reference data.
  • Historical agency structures - FPDS contains contract transactions from years past that reference agency codes and department IDs that have changed over time. The organization that awarded a contract in 2010 might have a different code structure today, and the old codes may not appear in either Federal Hierarchy or USAspending's current reference files.

These FPDS identifiers appear in millions of contract transactions but aren't present in USAspending's organization files. Tango addresses this by maintaining legacy organization data from historical FPDS sources, ensuring that even old contract transactions can be properly linked to their awarding organizations.

Tango's Unified Approach

Tango consolidates all these sources into a single Organization model with a priority-based field provenance system. Here's how it works:

Data Source Priority

Tango loads organizations from multiple sources in priority order:

  1. Federal Hierarchy (Top Priority) - The authoritative structure
  2. USAspending (Next Priority) - Fills in missing codes and details
  3. Legacy models (Historical FPDS) - Backfills from historical data

Higher-priority sources won't be overwritten by lower-priority ones, ensuring that Federal Hierarchy's authoritative structure is preserved while USAspending fills in the operational codes.

Finding Organizations in Tango

Tango provides several ways to find organizations, each optimized for different use cases:

1. Search by Name or Alias

The /api/organizations/ endpoint supports a search query parameter that uses a multi-stage search strategy:

# Search by abbreviation, acronym, or name
GET /api/organizations/?search=FEMA
GET /api/organizations/?search=Department of Transportation
GET /api/organizations/?search=Treasury OIG  # Context-aware search

Or programmatically using the search library:

from agencies.lib.search import search_organizations

# Finds organizations by abbreviation, acronym, or name
results = search_organizations("FEMA")
results = search_organizations("Department of Transportation")
results = search_organizations("Treasury OIG")  # Context-aware search

The search handles:

  • Exact alias matches - Catches abbreviations like "CIO", "OIG", "FEMA"
  • Trigram similarity - Handles typos like "FMEA" → "FEMA"
  • Full-text search - Finds organizations by keywords in names
  • Context-aware queries - "Treasury OIG" finds the OIG within Treasury

2. Filter by Code

You can filter organizations by any of the codes they might have:

# Find by CGAC code
GET /api/organizations/?cgac=069

# Find by FPDS code
GET /api/organizations/?fpds_code=2100

# Find by office code
GET /api/organizations/?code=15JCRM

3. Lookup by fh_key

The Federal Hierarchy's orgKey (mapped to fh_key in Tango) is used for direct lookups:

GET /api/organizations/{fh_key}/

Note: Tango uses two identifiers:

  • key (UUID) - The primary key for the Organization model, stable across reloads
  • fh_key (BigInteger) - The Federal Hierarchy's orgKey identifier, used for API lookups and cross-referencing with SAM.gov

4. Hierarchy Navigation

Each organization includes parent relationships and flattened hierarchy paths:

{
  "key": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "fh_key": 123456,
  "name": "Federal Emergency Management Agency",
  "short_name": "FEMA",
  "parent_fh_key": 789012,
  "l1_name": "Department of Homeland Security",
  "l1_short_name": "DHS",
  "full_parent_path_name": "Department of Homeland Security > Federal Emergency Management Agency"
}

Field Provenance

Every organization tracks which source provided each field and when it was last updated:

{
  "field_provenance": {
    "name": {
      "source": "federal_hierarchy",
      "modified_at": "2024-12-01T10:00:00Z"
    },
    "cgac": {
      "source": "usaspending",
      "modified_at": "2024-12-01T10:00:00Z"
    },
    "fpds_code": {
      "source": "usaspending",
      "modified_at": "2024-12-01T10:00:00Z"
    },
    "code": {
      "source": "legacy",
      "modified_at": "2024-12-01T10:00:00Z"
    }
  }
}

This transparency helps us ensure the reliability of each field and make informed decisions about which identifiers to use.

Practical Examples

Example 1: Finding an Organization from a Contract Transaction

When processing an FPDS contract transaction, you might see:

  • agencyID: 2100
  • departmentID: 097

In Tango, you can find the organization using either identifier:

# By FPDS code
GET /api/organizations/?fpds_code=2100

# Or search by name if you know it
GET /api/organizations/?search=Department of the Army

Example 2: Finding an Office from USAspending Data

USAspending financial assistance data might reference:

  • awarding_sub_agency_code: 1501
  • awarding_office_code: 15JCRM

You can find the office:

GET /api/organizations/?code=15JCRM

The response will include the full hierarchy, so you can see it's part of the Department of Justice (CGAC 015).

If you know you're looking for "Treasury OIG" but aren't sure of the exact code:

GET /api/organizations/?search=Treasury OIG

Tango's contextual search will find the OIG (Office of Inspector General) within the Treasury Department, even if there are multiple OIGs across different departments.

Best Practices

  1. Use key (UUID) for API references - The UUID key is the primary identifier for storing references to organizations in your application
  2. Use fh_key for Federal Hierarchy lookups - When cross-referencing with SAM.gov or other federal data sources that use the Federal Hierarchy, use fh_key
  3. Use code lookups for transaction matching - When matching transactions, use the specific code type (CGAC, FPDS code, office code) that appears in your data
  4. Leverage search for user-facing features - The search query parameter handles abbreviations, typos, and context better than exact code matching
  5. Check field_provenance for data quality - Understand which source provided each field to assess reliability

Conclusion

Federal organization data is fragmented across multiple sources, each with its own strengths and gaps. SAM's Federal Hierarchy provides the authoritative structure but lacks operational codes. USAspending fills in many codes but doesn't include FPDS contract identifiers. Tango unifies these sources with a priority-based system that preserves authoritative data while filling in the gaps, giving you a single, reliable way to find and reference federal organizations.

Whether you're matching contract transactions, processing financial assistance awards, or building user-facing search features, Tango's unified Organization model and flexible search capabilities help you find the right organization identifiers, regardless of which source your data comes from.