AI Assistant Blog Methodology Facilities FAQ About Contact Compare Prices

About the Data

Where our data comes from, how we process it, and why you can trust it.

380,000+
Healthcare Facilities
588M+
Price Records
5
Major Insurance Carriers
7,401
Hospital Chargemasters Indexed
3.76B
Plan-Level Rate Records
50 + DC
States Covered

Where Our Data Comes From

CarePrices.ai aggregates healthcare pricing data from two federally mandated transparency programs. These are not estimates, surveys, or crowdsourced data points. Every price on our platform comes from an official government-mandated disclosure.

Source 1: Hospital Price Transparency Files (CMS Mandate)

Since January 1, 2021, the Centers for Medicare and Medicaid Services (CMS) requires every Medicare-participating hospital to publish machine-readable files containing their standard charges. This includes:

  • Gross charges (chargemaster list prices)
  • Discounted cash/self-pay prices
  • Payer-specific negotiated rates (what each insurance company pays)
  • De-identified minimum and maximum negotiated rates

We have indexed files from 7,401 hospitals containing 11.4 billion pricing rows. These files cover thousands of CPT (Current Procedural Terminology) and DRG (Diagnosis Related Group) codes for procedures ranging from basic blood tests to complex surgeries.

Source 2: Insurance Carrier Machine-Readable Files (TiC Rule)

Since July 2022, the Transparency in Coverage (TiC) rule requires health insurers to publish machine-readable files disclosing their negotiated rates with healthcare providers. We ingest and process files from five major national carriers:

  • Aetna -- 68.6 million aggregated rate records; 2.1 billion plan-level records
  • Blue Cross Blue Shield -- 72 million aggregated records spanning 42 regional BCBS plans; 337 million plan-level records
  • Cigna -- 28 million aggregated records; 1.1 billion plan-level records
  • UnitedHealthcare -- 30 million aggregated records; 177 million plan-level records
  • Kaiser Permanente -- 13 million aggregated records; 20 million plan-level records

Together, the insurer data adds 3.76 billion plan-level rate records and 212 million payer-aggregate records across 22,000+ rated provider NPIs.

How We Process It

Raw hospital and insurer files are not usable in their published form. They are enormous, inconsistently formatted, and filled with medical coding jargon. Our data pipeline transforms them into a queryable, comparable format.

Step 1: Discovery and Ingestion

We locate and download machine-readable files from every compliant hospital and all five carrier systems. Hospital files come in CSV, JSON, and various non-standard formats. Our system handles URL redirects, authentication barriers, broken links, and format variations automatically.

Step 2: Parsing and Normalization

Custom parsers handle the wide variety of file formats. We normalize CPT and DRG codes, standardize payer names across different naming conventions, resolve hospital identifiers via NPI (National Provider Identifier) and CMS Certification Numbers, and geocode facility locations for geographic search.

Step 3: Validation and Quality Control

Automated quality checks flag outliers, duplicates, and data quality issues. We cross-reference against CMS fee schedules, NPPES provider data, and known pricing ranges. Sentinel values (placeholder rates like $0.01 or $9,999,999) are filtered. Records that fail validation checks are excluded from the consumer-facing database.

Step 4: Aggregation and Indexing

Validated data is aggregated into our DuckDB-based analytical database, optimized for fast lookups by procedure, location, payer, and facility. Statistical summaries (medians, percentiles, distributions) are precomputed to enable real-time comparison without query delays.

Step 5: Carrier MRF Integration

Insurer rates are joined to hospital data by provider NPI, CPT code, site of care, and plan type. This allows us to show carrier-specific rates alongside hospital-published prices for a complete pricing picture.

Step 6: Component Analysis

For common procedures, we identify all billing components (facility fee, professional fee, anesthesia, supplies) and model the expected total cost by site of care. This gives users the full picture rather than a single line item that may represent only part of the total bill.

How Often We Update

Our data pipeline runs on a monthly cycle:

  • Hospital files: Re-ingested monthly. Most hospitals update their files quarterly or annually. Our system detects when a hospital has published a new file and re-processes it.
  • Carrier MRFs: Re-ingested monthly. Insurers update their files on varying schedules, typically monthly or quarterly.
  • Quality checks: Run continuously as new data arrives. Automated validation catches format changes, broken files, and data quality regressions.
  • Cost guide statistics: Recalculated after each data refresh to reflect the latest pricing landscape.

Data Quality Methodology

We apply multiple layers of quality control to ensure the data you see is accurate and meaningful:

  1. Format validation: Files must conform to expected schemas. Malformed files are flagged for manual review.
  2. Code standardization: Non-standard procedure codes are mapped to standard CPT codes using a maintained crosswalk table.
  3. Outlier detection: Prices that fall outside expected ranges (e.g., an MRI priced at $50,000 or $0.01) are flagged and excluded.
  4. Cross-referencing: Hospital-reported rates are compared against Medicare fee schedules and carrier MRF data for consistency.
  5. Duplicate removal: Files that contain duplicate entries or multiple versions of the same rate are deduplicated.
  6. Geocoding verification: Facility locations are verified against NPPES data and geocoded for accurate distance-based search.

Coverage Statistics

  • Hospitals indexed: 7,401 from chargemaster files
  • Additional providers from MRFs: 22,029 provider NPIs
  • Total data points: 15+ billion pricing records (11.4B hospital + 3.76B plan-level)
  • Insurance carriers: 5 national carriers (Aetna, BCBS, Cigna, Kaiser, UHC)
  • Geography: All 50 states plus the District of Columbia
  • Procedure codes: 10,522 distinct CPT codes represented
  • Cost guides: 47 detailed procedure guides with statistics

Limitations and Caveats

While we strive for comprehensive and accurate data, users should be aware of important limitations:

  • Hospital compliance varies. Not all hospitals publish complete or correctly formatted files. Some facilities have incomplete data or non-standard formats that resist automated parsing.
  • Published prices may not reflect your actual cost. Your out-of-pocket cost depends on your insurance plan, deductible status, copays, coinsurance, and whether the provider is in-network.
  • Facility fees vs professional fees. Hospital chargemaster prices typically cover only the facility component. Professional fees are billed separately. We estimate total procedure costs by combining components, but actual bills may differ.
  • Carrier data limitations. MRF rates include all plan types (commercial, Medicare Advantage, Medicaid managed care). We filter outliers but cannot always distinguish plan types in aggregated views.
  • Update frequency differs. Hospitals and insurers update their files on different schedules. Some prices may be up to 12 months old.

Our Commitment to Transparency

We believe that healthcare pricing data should be accessible, understandable, and trustworthy. Our business model (advertising and data licensing) does not depend on any hospital or insurer relationship. We have no financial incentive to favor any provider, carrier, or facility type. The data we present is the data that was published.

For questions about our data, methodology, or to report a data issue, contact us at [email protected].

Full Methodology Compare Prices