The integration team had done the work. Twelve weeks of development, three rounds of end-to-end testing in the sandbox, sign-off from both technical teams, and a successful go-live readiness review. Their HL7 ADT feed connecting the hospital's EHR to the downstream bed management system worked perfectly in every test they ran. Their test library had 47 distinct message scenarios. They were proud of it.

On day one of production, it broke.

The trigger was a patient who arrived in the emergency department with a psychiatric hold and was simultaneously being readmitted from a post-acute facility to a medical-surgical bed — two active admissions in the system at the same time, each generating ADT messages with the same MRN but different encounter numbers. Their integration engine had never seen this scenario. None of their 47 test messages included a patient with two simultaneous active visits. The bed management system received two ADT^A01 admit messages for the same patient identifier in the span of eleven minutes, treated the second as a correction to the first, and locked the patient to the psychiatric unit while the med-surg team waited for a bed assignment that the system had silently discarded.

The ED charge nurse called the integration team directly. "Your system is showing the same patient in two beds that are fifty feet apart, and neither bed manager knows what's real."

The fix took four hours. The root cause analysis took a week. The lesson — the one that integration engineers in healthcare keep relearning — is that "works in sandbox" and "works in production" are separated by exactly the scenarios your sandbox data never included. And the scenarios your sandbox data never included are almost always the ones that occur in the first few days of every real go-live.

Why HL7 v2 Is Still Everywhere — and Why That Matters for Testing

Healthcare integration professionals sometimes express frustration that HL7 version 2.x messaging is still the dominant format for real-time clinical data exchange in most live environments, decades after it was first introduced. FHIR is the future. It may also be the present, for certain use cases. But for ADT notifications, lab results routing, order communication, and financial transactions, HL7 v2 is the format that the engines are actually running in production today.

The reasons are structural rather than technical. HL7 v2 interfaces were implemented in waves — the early 1990s when LIS systems needed to send results to HIS systems, the early 2000s when EMR adoption began to accelerate, the 2011-2015 Meaningful Use period when certified EHR adoption became universal among eligible providers. Each wave of implementations produced HL7 v2 interfaces that are now embedded in live clinical workflows. Replacing a working HL7 v2 interface with a FHIR subscription requires justification beyond "FHIR is better" — it requires downtime coordination, regression testing, and the risk of introducing new failures into workflows that have been stable for years. So HL7 v2 stays.

A healthcare integration engineer in 2026 needs to be fluent in both. The new implementation might be FHIR R4, but the legacy interface that feeds it is HL7 v2.3.1. The FHIR endpoint that serves the patient portal gets its data from an EHR that sends ADT messages to the integration engine via HL7 v2.5.1. Testing must cover both — and the realistic failure modes of both — to have any confidence that production behavior will match sandbox behavior.

The Healthcare Integration Ecosystem: How Messages Actually Flow

Understanding what test data must represent requires understanding how messages move between systems in a real healthcare environment. In a typical medium-sized hospital, clinical data flows across dozens of interfaces simultaneously:

The EHR is the system of record for patient demographics, encounters, orders, and clinical documentation. When a patient is admitted, the EHR generates an ADT^A01 message. That single message may trigger processing in the bed management system, the nursing unit notification system, the dietary management system, the laboratory information system, the pharmacy system, the case management platform, the care management vendor's ingestion API, and the revenue cycle system — all of which have their own interface engines consuming and transforming the same HL7 message into their own native formats.

The interface engine — whether Mirth Connect, Rhapsody, Cloverleaf, InterSystems HealthShare, or a custom-built integration platform — sits in the middle of this flow. It receives messages from source systems, applies transformation logic to reformat or enrich them, applies routing logic to determine which downstream systems receive which messages, and sends transformed messages to destinations. The interface engine is the component that must be tested most exhaustively — and it is the component that most consistently receives the least complete test data.

An interface engine that processes 50,000 ADT messages per day encounters hundreds of distinct patient scenarios daily. Its test data library should represent that variety. Most test libraries represent fewer than 50 scenarios — carefully constructed happy-path cases that the development team thought of, missing the 200 edge cases that only appear when real patient populations move through real clinical workflows.

HL7 v2.x Message Types: What Integration Testing Must Cover

The ADT message type group is the most common in real-time healthcare integration, but it is far from the only one. A comprehensive integration test strategy must cover all of the message types that the integration will process in production. Here is what each major message type requires from test data:

ADT — Admit, Discharge, Transfer

ADT messages communicate patient movement events. They are the heartbeat of hospital information systems — a large academic medical center may process 100,000 or more ADT messages per day across all event types. The full ADT event type library includes:

Most integration test libraries include A01, A02, A03, and A08. Most production interfaces also process A06 (observation-to-inpatient conversion), A11 and A13 (cancellation events), A21 and A22 (leave of absence), and A40 (MRN merge) on a regular basis. These are not exotic events — they happen in every hospital, every day. They are missing from test libraries because nobody built them.

ORM — Order Messages

ORM^O01 order messages communicate physician orders from the EHR to ancillary systems: laboratories, radiology, pharmacy, dietary, physical therapy. An order message contains the ordered item (typically a LOINC code for lab, a CPT or custom code for radiology), the ordering provider NPI, the patient and visit identifiers, the clinical priority (STAT, routine, ASAP), and additional fields specific to the order type.

Test data for order interface testing must cover the full range of order types and statuses: new orders, order modifications (ORM^O01 with an ORC status of CA for cancel), STAT orders that should trigger different routing or notification logic than routine orders, and orders placed for a patient who has been discharged — which occurs regularly and requires the downstream system to handle the encounter-ending state gracefully. Radiology orders additionally require the ordering indication (diagnosis or clinical reason) that some PACS systems require to correctly populate the order worklist.

ORU — Observation Results

ORU^R01 observation result messages are how laboratory and diagnostic results flow back to the EHR and to ordering providers. The message structure includes the patient and visit identifiers, the order that triggered the result, and one or more OBX segments (Observation/Result segments) that carry the actual result values.

Each OBX segment contains: the observation identifier (a LOINC code for standardized results, or a local code for custom observations), the observation value type (NM for numeric, ST for string, CWE for coded element, TX for text), the actual value, units of measure (using UCUM codes for standardized units), reference ranges, and an abnormal flag. A single lab panel — a complete metabolic panel (CMP), for instance — generates 14 separate OBX segments in a single ORU message. A result that is critically abnormal carries a specific flag value (LL for critically low, HH for critically high) that should trigger notification workflows in the receiving system.

Test data for ORU interfaces must include: routine results with normal values, results with high and low flags, critically abnormal results with LL and HH flags, results with text interpretation (microbiology sensitivities, pathology reports), corrected results that amend previously delivered values (the OBX status field changes from F for final to C for correction), and results for cancelled orders that the system may still attempt to deliver after the order was cancelled downstream.

DFT — Detailed Financial Transactions

DFT^P03 messages carry charge capture transactions from clinical systems to billing systems. When a nurse documents medication administration or a physician completes a procedure note, the EHR may generate a DFT message containing the charge code, the service date, the ordering and performing provider identifiers, and the revenue code. The DFT is the bridge between clinical documentation and professional fee billing.

DFT test data must include valid charge codes with correct revenue code pairings, HCPCS codes for procedures and supplies, and the clinical context (diagnosis pointers, provider identifiers) that the billing system requires to correctly complete the claim. A DFT message that contains a charge code but no diagnosis pointer, or that references a provider NPI that doesn't exist in the billing system's provider master, will produce a charge entry that cannot be billed — and may drop off the system without generating an error that the billing team sees.

MFN — Master File Notifications

MFN messages update the master data that downstream systems depend on: provider directories, charge description masters, location masters, insurance plan files. MFN^M02 updates the staff practitioner file — adding new physicians, updating credentialing status, or terminating providers who have left the organization. MFN^M04 updates the charge description master with new or modified charge items.

Integration testing for MFN feeds is frequently neglected because master file updates feel like administrative events rather than clinical events. But a failed MFN^M02 update that doesn't successfully add a new physician's NPI to the downstream billing system means that provider's charges will error when they reach the billing engine. A failed MFN^M04 update that doesn't communicate a charge code revision means the billing system uses an outdated CDM entry. These failures are silent — no patient alert, no clinical workflow interruption — and they persist until someone notices that a specific provider's charges are erroring or that a specific charge code is mapping incorrectly.

SIU — Scheduling Information Unsolicited

SIU messages (S12 through S25) communicate scheduling events: new appointment bookings (S12), appointment modifications (S14), appointment cancellations (S15), no-shows (S26), and slot availability queries. Scheduling interfaces are among the most underrepresented in integration test libraries — partly because scheduling feels less critical than ADT or results, and partly because scheduling test data requires a realistic appointment calendar with provider schedules, slot availability, appointment types, and insurance verification status that most sandbox environments don't maintain.

In practice, scheduling interface failures have significant downstream consequences. A scheduling message that fails to reach the registration system means the patient arrives with no pre-registration. A scheduling cancellation that doesn't propagate to the patient portal means the patient arrives for an appointment that has been cancelled. A scheduling integration that cannot handle back-to-back appointments for the same provider — a scenario that any real scheduling test library should include — may miss the edge case where the provider's calendar shows zero availability between two patients, causing a time slot to appear available to a booking system when it is not.

The Dual-Admission Problem and Other Complex ADT Scenarios

The incident described in the opening — a patient with two simultaneous admissions generating conflicting A01 messages — is one of a class of complex ADT scenarios that almost never appear in test data but occur regularly in production. Each of them requires specific test message sequences to detect the integration failure modes they expose.

Simultaneous Admissions: Psychiatric Hold + Medical Admission

In psychiatric and general hospital settings, a patient may be placed on an involuntary psychiatric hold (5150, 5250, or equivalent depending on state law) while also requiring medical admission for an acute condition — a diabetic ketoacidosis patient who is also psychotic, for instance, or a patient who overdosed and requires both medical stabilization and psychiatric evaluation. Each admission generates its own encounter number and its own stream of ADT events. Systems that model one active encounter per patient identifier break when two simultaneous encounter streams arrive for the same MRN.

The specific failure modes vary: some systems silently merge the two encounter streams and lose one set of events; some throw a constraint violation exception and drop messages to an error queue; some correctly create two encounter records but fail to correctly associate subsequent events (transfers, results, orders) to the right encounter. Testing this scenario requires a carefully sequenced set of test messages: A01 for the first admission, a second A01 for the same patient with a different encounter number, then subsequent events (transfers, results, discharge) that must be correctly routed to their respective encounter.

Transfer Chains: ICU Through Multiple Care Levels

A patient admitted to the ICU who progresses through step-down, medical-surgical, and then a rehabilitation unit generates a sequence of A02 transfer messages that each must correctly update the patient's current location in every downstream system. The failure modes in transfer chain testing include: systems that do not update the location correctly after the third transfer in a sequence, systems that cannot handle a transfer to a unit that was not configured in the location master, and systems that lose track of the patient's "home unit" when a transfer is cancelled (A12) and then re-initiated.

A transfer chain test scenario for a ten-day hospitalization might include: A01 (ICU admit), A02 (transfer to step-down), A02 (transfer to med-surg), A08 (insurance update), A02 (transfer to observation), A06 (observation-to-inpatient conversion), A02 (transfer to medical-surgical), A03 (discharge to SNF). That is eight messages for one patient encounter. Many test libraries contain one or two transfer events per patient at most.

The Patient Swap: Wrong Patient Admitted Under Wrong Identity

A patient arrives at registration and is admitted under another patient's name — either because a previous registration was not cleared, because two patients with similar names were confused, or because the patient themselves provided incorrect identifying information. HIM identifies the error and initiates a correction. The sequence of events this produces — typically an A40 merge or identity correction message followed by demographic updates — must be handled correctly by every downstream system that received the original admission messages.

This scenario is not hypothetical. Patient identification errors occur at a rate that the Joint Commission considers a persistent safety concern. The data governance implications — a lab result that arrived under the wrong MRN, an order that was placed under the wrong identity, charges that accrued to the wrong account — require that the integration engine can process identity correction messages and propagate them correctly to every system that received the original messages. No standard sandbox test library includes this scenario.

Observation Status: Not Inpatient, Not Outpatient

Observation status is one of the most clinically and financially consequential distinctions in hospital billing — and one of the most complex to represent in test data. A patient placed in observation status is technically an outpatient. They may receive inpatient-level care, occupy an inpatient bed, and remain in the hospital for multiple days. But for Medicare, their status determines whether their hospital costs are covered under Part A (inpatient) or Part B (outpatient) — a distinction that can cost a patient thousands of dollars in out-of-pocket expenses under the Medicare Two-Midnight Rule.

From an integration perspective, observation status generates an A04 (outpatient registration) rather than an A01 (inpatient admission). If the patient is subsequently converted to inpatient status — because their physician determines they meet Two-Midnight criteria — the system generates an A06 (outpatient-to-inpatient change). If the patient is discharged from observation, they receive an A03 discharge. Each of these paths requires specific handling in downstream systems that cannot simply treat all admitted patients as inpatients. Test data that only presents A01 and A03 will not surface the observation status handling logic that fails when A04 and A06 appear in production.

Newborn Admission Linked to Mother's Account

When a baby is born during a mother's inpatient admission, the newborn receives their own MRN and encounter number. But in many workflows, especially for billing, the newborn's hospital charges during the birth admission are captured under the mother's account (for a normal delivery) or the newborn's own account (if the newborn requires NICU admission or has their own clinical complexity). The ADT sequence for a newborn admission includes a specific newborn event type, a link between the newborn's encounter and the mother's encounter, and in many systems a demographic relationship record that associates the two patients.

Integration systems that have never been tested with a mother-newborn encounter pair may fail to correctly establish the link, may route the newborn's results to the mother's result feed, or may fail to correctly separate the charges when billing. The test data required is a mother's admission followed by a correctly structured newborn admission message that includes the appropriate PV1 and PID segment values to establish the mother-infant relationship.

Enterprise MPI and Multiple MRN Scenarios

Patients who receive care at multiple facilities within a health system may be registered under different MRNs at each facility. The Enterprise Master Patient Index (EMPI) is the system responsible for linking these identifiers — recognizing that Patient 1847291 at Community Hospital and Patient 9043817 at Regional Medical Center are the same person. An ADT^A40 merge message communicates this linkage to downstream systems.

EMPI merge scenarios require test data that presents the pre-merge state (two separate MRNs, potentially with different demographics, different insurance records, and different clinical histories) and the post-merge state (a single surviving MRN, all data associated with the survived identifier). Downstream systems must correctly handle the merge and update all records — including historical records — to use the surviving identifier. Systems that process the A40 message but fail to update historical data create split-patient records that persist in the database until discovered manually, which in large health systems may be years.

EMPI merge test scenario: Patient registers at urgent care affiliate as "Robert J. Smith," DOB 1962-03-14, under MRN UC-18472. Six months later, same patient presents at the main hospital ER as "Bob Smith," DOB 1962-03-14, and is registered under MRN MH-90438. EMPI probabilistic matching identifies these as the same person with 97% confidence. HIM staff reviews and confirms the merge. An A40 message is sent to all integrated systems, designating MH-90438 as the surviving identifier.

Your downstream care management platform receives the A40. Does it correctly move the patient's care plan from UC-18472 to MH-90438? Does it correctly handle the situation where the patient has open care gaps under both identifiers? Does it generate a single unified patient record, or two records, or does it crash with a duplicate key exception? None of these outcomes are testable without a realistic EMPI merge test scenario — which is why most systems don't discover their merge handling failures until production.

FHIR R4 Integration Testing: Beyond Skeletal Resources

FHIR R4 is the current standard for modern healthcare APIs, required by CMS under the Interoperability and Patient Access Rule (CMS-9115-F) for patient access APIs, provider directory APIs, and payer-to-payer data exchange. The 21st Century Cures Act further mandated FHIR-based APIs as part of ONC Health IT Certification criteria (§170.315(g)(10)), requiring certified EHRs to support FHIR R4 patient access APIs by December 31, 2022.

FHIR integration testing is harder than HL7 v2 testing in one specific way: the FHIR specification allows extensive optionality. A FHIR Patient resource is valid with nothing but an ID and a name. It is also valid with 40 populated fields including multiple identifiers, multiple names, multiple addresses, multiple contact relationships, communication preferences, extension arrays, and linked patient references. The integration that works correctly with the minimal Patient resource may fail completely when it encounters the fully-populated one — because the logic that handles a Patient with one identifier was never tested against a Patient with four identifiers using four different system URIs.

REST API Endpoint Conformance Testing

FHIR server conformance testing requires test data that exercises the full range of supported search parameters, operations, and modifier combinations defined in the server's CapabilityStatement. A Patient search by name, birthdate, and identifier is the happy path. The tests that matter are: search with a name that includes special characters (the O'Brien problem — apostrophes in names that are URL-encoded differently by different clients), search for a patient with a deceased flag, search with a date range modifier (_ge and _le), and search results that span multiple pages (_count and pagination token handling).

_include and _revinclude parameters — which allow a client to fetch related resources in a single request — are a common source of conformance failures. A request for Patient?_include=Patient:general-practitioner should return the Patient resource and the related Practitioner resource. A request for Encounter?_revinclude=Observation:encounter should return Encounters and any Observations that reference those Encounters. Test data that populates these relationship references correctly, and in sufficient variety, is what makes conformance testing meaningful rather than performative.

Bulk FHIR Export ($export)

The Bulk FHIR export operation ($export) is the mechanism for large-scale data transfer — moving an entire patient panel's data from a payer to a provider, or extracting all records for a population cohort for quality reporting. The output format is NDJSON (newline-delimited JSON), with one resource per line, split across files by resource type.

Testing $export requires large synthetic datasets — not 50 patients, but thousands. At 50 patients, the export completes in seconds and produces files small enough that any parsing error is immediately obvious. At 50,000 patients, the export takes minutes, produces files in the hundreds of megabytes, and surfaces failures that only manifest at scale: memory exhaustion in the parsing logic, incorrect handling of multi-file responses for large resource types, and timeout failures when the client waits too long for the async export to complete.

Group-level filtering ($export on a specific Group resource) requires test data that includes FHIR Group resources with realistic member lists and group characteristics — not a single group containing all patients, but multiple groups representing different population cohorts, care management panels, and payer attribution lists.

SMART on FHIR: Auth Flow Testing

SMART on FHIR is the authorization framework that governs how third-party applications access FHIR resources on behalf of patients (standalone launch) or within an EHR session (EHR launch). Testing the SMART on FHIR auth flow requires test patients who have been granted specific scopes, test applications registered with specific allowed scopes, and test scenarios that exercise scope enforcement — verifying that an application authorized for patient/Observation.read cannot access patient/MedicationRequest.read.

Token lifecycle testing requires test scenarios that cover token expiration and refresh, revocation of access by the patient, and the behavior of the FHIR server when a request arrives with an expired token versus a revoked token. These are not distinctions that a test framework with five test patients will surface — they require realistic patient identities with real OAuth flows, realistic application registrations, and temporal test scenarios that allow tokens to expire during the test sequence.

CDS Hooks: Testing Clinical Decision Support at the Integration Layer

CDS Hooks is the FHIR-aligned specification for embedding clinical decision support into EHR workflows. A CDS Hook fires at a specific point in the clinical workflow (a hook type, such as patient-view, order-select, or medication-prescribe), sends a context payload to a registered CDS service, and receives a response containing cards — actionable suggestions displayed to the clinician. The cards may be informational, they may contain suggestions to add or modify orders, or they may link to external resources.

Testing CDS Hooks requires test patients whose clinical data triggers the hook under the conditions the CDS service is designed to detect. A medication-prescribe hook that checks for drug-drug interactions needs test patients who are already on medications that interact with the one being prescribed. A patient-view hook that alerts for overdue preventive care screenings needs test patients with realistic preventive care histories that include specific gaps. Synthetic test patients created for general EHR testing rarely have the clinical specificity required to trigger CDS Hooks in meaningful ways.

Real-World Integration Failure Modes: The Edge Cases That Test Data Must Include

Beyond the message type coverage and FHIR resource completeness issues, there is a category of integration failure modes that only appear in real patient data — encoding quirks, null value handling, field length violations, and sequence anomalies that manual test data construction almost always misses.

Special Characters in Patient Names

The O'Brien problem is real and widespread. Patient last names containing apostrophes (O'Brien, D'Angelo), hyphens (Smith-Johnson), accented characters (Ñoño, García, Müller), and non-Latin characters (patients with Chinese, Arabic, Korean, or other non-ASCII names) expose encoding issues in integration engines that were built assuming ASCII patient names.

HL7 v2 messages use a default encoding that assumes a subset of ASCII for delimiters and may corrupt multi-byte UTF-8 characters in patient name fields. An interface engine configured for HL7 v2 version 2.3 that receives a patient name in UTF-8 encoding (which became standard in v2.7) may silently truncate or corrupt the name. The patient record in the downstream system then has an incorrect name, which may cause insurance verification mismatches, patient identification errors, and medical record integrity issues.

Test data must include patients with names that exercise encoding edge cases: apostrophes, hyphens, non-ASCII characters, names with multiple word components in the family name field, and names that are deliberately long (some name fields are defined with a maximum length that is shorter than some real names). A patient named "María de los Ángeles García-Rodríguez" is not unusual in a population with significant Hispanic representation — but she will break every interface that assumes ASCII names and maximum field lengths of 25 characters.

Date Format Inconsistencies

HL7 v2 date-time format is yyyyMMddHHmmss — a 14-character string with no separators. FHIR uses ISO 8601 format — 2026-04-22T14:30:00-05:00. Legacy HL7 v2 implementations sometimes send only 8-character date-only strings (yyyyMMdd) in fields that are defined to accept full date-times. Some EHR implementations send dates with precision that varies by field — birth dates as yyyyMMdd but procedure dates as yyyyMMddHHmm. An interface engine that is not explicitly designed to handle variable-precision dates will either fail on the short format or fail on the long format, depending on which one it was tested with.

FHIR date-time handling introduces a different problem: timezone offsets. An Observation with an effectiveDateTime of "2026-04-22T14:30:00" (no timezone) is ambiguous — the FHIR specification allows it but notes it should be avoided. An Observation with "2026-04-22T14:30:00Z" is UTC. An Observation with "2026-04-22T09:30:00-05:00" is the same moment in Central time. An integration engine that does not correctly normalize timezone offsets may display the same event at different times in different downstream systems, creating apparent discrepancies that clinical staff find alarming and that are extremely difficult to diagnose.

Empty Segments and Null Flavor Handling

HL7 v2 allows optional segments to be omitted entirely, or to be sent as empty segments (containing only the segment identifier and delimiters). Some EHR implementations send empty segments consistently — every message includes every possible segment, but optional segments appear as "NK1|||||||" with no actual content. Others omit optional segments entirely when there is no data to report. An interface engine configured against one convention will fail against the other — either by trying to parse an absent segment or by ignoring an empty one that contains a required field in a specific installation's custom profile.

Null flavor handling — representing the fact that a value is explicitly unknown, not applicable, or masked — is a source of persistent failures. A patient with an unknown date of birth (common in emergency presentations) must be represented somehow. HL7 v2 uses "" (empty field) or specific null value strings in some implementations. FHIR uses the _birthDate extension with a data-absent-reason code. An integration that assumes all patients have known birthdates will throw a null pointer exception when it encounters the first patient who doesn't. Test data must include patients with explicitly missing or unknown values in fields that implementations assume are always populated.

Duplicate Message Handling

Network-level message delivery in HL7 v2 uses the MLLP (Minimal Lower Layer Protocol) framing, where each message is acknowledged by an ACK response. If the ACK is lost in transit, the sending system may resend the message — resulting in duplicate messages arriving at the receiving system. A well-designed interface engine detects duplicates using the message control ID (MSH-10 field) and discards the duplicate rather than processing it twice. An interface engine that has never been tested with duplicate messages may process them twice — creating duplicate patient registrations, duplicate lab results, or duplicate charges.

Testing duplicate message handling requires test scenarios that deliberately present the same message twice — with the same MSH-10 message control ID — and verify that the downstream system creates only one record. It also requires testing what happens when a near-duplicate arrives: the same patient ADT event with a slightly different timestamp or a single field changed. Is that a correction to be applied, or a true duplicate to be discarded?

Out-of-Sequence Messages

Real-time message delivery is not guaranteed to be in chronological order. Network delays, system queuing, and batch processing can cause messages to arrive out of sequence. A discharge message (A03) that arrives before the admission message (A01) it logically follows will cause systems that validate encounter state transitions to reject the discharge. Systems that apply messages without state validation may create inconsistent records — a discharged patient with no admission record.

Out-of-sequence testing requires deliberately constructing test sequences where messages arrive in the wrong order and verifying that the interface engine either correctly queues and resequences them, or correctly rejects the out-of-sequence message and generates an error that allows the message to be replayed after the prerequisite messages have been delivered.

Interface upgrade test scenario: A hospital is migrating from HL7 v2.3 to HL7 v2.5.1 on their ADT feed. In v2.3, the PID-3 patient identifier list field contains a single identifier. In v2.5.1, PID-3 is a repeating field that can contain multiple identifiers with system identifiers and identifier type codes. The downstream registration system was built to parse a single PID-3 value. When the upgraded feed sends a patient with both an MRN and a social security number in PID-3 as two separate repetitions, the registration system either parses only the first repetition (losing the SSN) or fails entirely on the unexpected repetition character.

This failure was not discovered in the migration testing because the test patient population used for the migration test was extracted from the production system — and all of the test patients had only a single identifier in PID-3. The first patient with two identifiers appeared on day four of production, when the HIM team entered a patient's SSN for insurance verification purposes, adding it to the identifier list. The registration system crashed. The fix required a parser rewrite that should have been tested during migration.

Interface Engine Testing: Mirth Connect, Rhapsody, and Their Peers

The interface engine is where most of the actual transformation and routing logic lives. Testing the interface engine requires test data that exercises not just the happy path but the full range of conditions that the transformation and routing rules must handle correctly.

Transformation Rule Testing

Interface engine transformations — mapping fields from the source format to the destination format, applying lookup table translations, splitting or joining fields — fail in predictable ways when source data doesn't match the assumptions baked into the transformation logic. A transformation that concatenates PID-5.1 (family name) and PID-5.2 (given name) into a "lastname, firstname" format will produce "Smith, " (with a trailing space) for a patient with no given name on record. It will produce an unexpectedly long string for a patient with a compound family name that pushes the concatenated result past the destination field's maximum length.

Testing transformation rules requires test data that exercises the boundary conditions: maximum field lengths, minimum field content (single-character values, empty values), values that contain the delimiter characters used in the message (pipe, caret, tilde in HL7 v2), and values that contain HTML or XML characters if the transformation outputs to an XML or JSON format.

Routing Logic Testing

Interface engines route messages to different destinations based on message content — sending ADT messages for inpatients to the bed management system but not to the outpatient scheduling system, or routing lab results for a specific ordering location to that location's results viewer. Routing logic that has been tested only with the expected cases will fail when it encounters unexpected content.

A routing filter that sends messages to the cardiac monitoring system when the admitting department code is "CATH" will fail silently when the admitting department sends the code as "CATH LAB" (with a space) or "CATHLAB" (as one word) — both of which are real-world variations that different EHR configurations produce. Test data must include the variations that the interface engine will encounter in production, not just the canonical form that the source system is supposed to send.

Error Queue Testing

Every interface engine has an error queue — the destination for messages that fail processing and require manual intervention. Testing the error queue means testing what happens to malformed messages: messages that fail schema validation, messages with missing required fields, messages that the transformation rule cannot process because the source data is in an unexpected format. The error queue behavior — does the engine retry the message? Does it generate an alert? Does it allow the message to be corrected and reprocessed? — is as important as the happy-path behavior, because in production, malformed messages arrive regularly.

Throughput and Volume Testing

A large hospital may process 50,000 ADT messages per day — more than 34 messages per minute on average, with peaks at shift change, morning admission rounds, and evening discharge time that may be three to five times the average rate. Interface engines tested at low volume may perform correctly but degrade under production load due to database connection pool exhaustion, memory leak accumulation, or transformation logic that is O(n²) in message volume rather than O(n).

Volume testing requires synthetic test data in the tens of thousands of records — enough to produce realistic throughput loads, realistic queue depths during burst periods, and realistic data distribution patterns that surface bugs that only manifest when a specific combination of data values appears with sufficient frequency. A bug triggered by a patient name containing an apostrophe will not appear in testing if none of your 50 test patients have apostrophes in their names. If 2% of real patients have apostrophes — as in any population with significant Irish, Italian, or French heritage — a test population of 50,000 will encounter 1,000 of them, and the bug will surface.

The Testing Strategy That Actually Works: The Integration Testing Pyramid

Integration testing in healthcare benefits from the same layered approach that software engineering has applied to testing generally, adapted for the specific failure modes and regulatory requirements of healthcare data exchange.

Unit Testing: Message Parsers and Transformation Rules

Unit tests validate individual components in isolation: the HL7 v2 parser correctly extracts PID-3 values with multiple repetitions; the date normalization function correctly converts HL7 date-time strings to ISO 8601; the LOINC lookup table returns the correct local code for a given LOINC code. Unit tests are fast, deterministic, and can be run on every code change. They require carefully constructed test inputs that exercise boundary conditions — including the encoding edge cases, null value scenarios, and field length violations described earlier.

Integration Testing: End-to-End Message Flow

Integration tests verify that a complete message flows correctly from source to destination through the full transformation and routing pipeline. An integration test sends a test ADT^A01 message to the interface engine and verifies that the correct downstream system receives a correctly transformed message. Integration tests require realistic test patients — complete enough that all transformation rules receive valid inputs, but still synthetic so they can be used freely in test environments.

End-to-End Testing: Clinical Scenario Sequences

End-to-end tests verify that complete clinical scenarios — a patient who is admitted, transferred three times, has labs ordered and resulted, is scheduled for a procedure, and is discharged — produce the correct state in all downstream systems. These tests require test data that represents the full scenario sequence, with each message in the correct order and with the correct content to trigger the downstream system behaviors being tested. End-to-end tests are the most valuable and the most expensive to build — which is why realistic synthetic test data that pre-constructs these scenarios is so valuable.

Chaos Testing: Deliberate Adversarial Conditions

Chaos testing validates system resilience under adversarial conditions: network partitions that prevent message delivery, source system outages that produce message backlogs, malformed messages that should be rejected gracefully, and the failure of a downstream system that should not block the processing of messages for other downstream systems. Chaos testing requires the ability to deliberately inject failure conditions — and the test data to feed the system during the failure, so that the recovery behavior (message replay, error queue processing, alert generation) can be verified.

Regulatory and Certification Requirements for Integration Testing

EHR systems that are certified under the ONC Health IT Certification Program (§170.315) must demonstrate conformance to specific interoperability criteria, including the ability to transmit and receive standardized clinical summaries, support for HL7-specified transport protocols, and FHIR R4 API capabilities defined in §170.315(g)(10). The HL7 profiles specified in ONC certification criteria define required and optional elements for each message type — and certification testing validates that the EHR correctly handles both.

The 21st Century Cures Act's information blocking provisions (45 CFR Part 171) prohibit EHRs and health IT developers from interfering with the access, exchange, or use of electronic health information. An interface engine that fails on specific message types — blocking data exchange for patients whose records include those message types — may constitute information blocking if the failure is not corrected with appropriate urgency. The compliance implication is that interface engines must be tested against the full range of message types they claim to support, not just the subset that was convenient to test.

HIPAA Security Rule requirements (45 CFR §164.312) mandate technical safeguards for the transmission of electronic PHI. In integration testing environments, these requirements apply even to test data if the test data was derived from real PHI. Synthetic test data that contains no real PHI is not subject to HIPAA transmission security requirements — making it easier to distribute across test environments, share with vendor partners for integration testing, and retain in test environments for regression testing purposes. This is not just a legal technicality; it is a practical enabler of more thorough testing, because teams are more willing to use and share test data that carries no PHI compliance burden.

EHR Integration Test Data Built for Go-Live Day Confidence

PatientDatasets.com EHR integration packages include full ADT scenario libraries with all 17 event types, HL7 v2.x message suites for ADT, ORM, ORU, DFT, MFN, and SIU, FHIR R4 patient bundles with complete resource population, edge case coverage including dual admissions, MRN merges, observation-to-inpatient conversions, special character names, null value scenarios, and out-of-sequence message sets. Volume options from 1,000 to 500,000 patients. No PHI. Commercial license included.

Explore EHR Integration Datasets →

What a Comprehensive EHR Test Data Package Must Include

Translating this analysis into a practical test data specification, a comprehensive EHR integration test data package needs:

The patient with two simultaneous admissions will arrive on day one of your go-live. The patient whose name is "Siobhán Ó'Muircheartaigh" will arrive in the second week. The patient with three MRNs from three different merges will arrive in the third week. The patient who is admitted, discharged in error, and then has the discharge cancelled via A13 while still in their room will arrive when you least expect it.

The question is not whether these patients exist. They do — in every real patient population that any interface will encounter in production. The question is whether your integration engine has already met them — in testing, where the cost of failure is a bug report and a message in the error queue — or whether production is where they introduce themselves, with a charge nurse on the phone, a bed management system locked up, and an incident report that will be read at the next go-live readiness review.