The Data Infrastructure Healthcare Education Is Missing
Real patient data is locked behind HIPAA, IRBs, and data use agreements. We generate realistic synthetic patient records so data scientists, ML engineers, and healthcare students can work with production-quality clinical data instantly.
What We Are
A dataset marketplace.
We sell data.
PatientDatasets.com is a storefront for downloadable synthetic patient records. The data happens to be healthcare data, but the buyer pool is anyone who needs realistic patient data -- from a data science student in India to a Kaggle competitor to a PhD candidate at MIT.
Primary Market
Data Scientists & ML Engineers
Feature engineering, model training, portfolio projects, pipeline testing, algorithm benchmarking. Healthcare AI is a $45B market by 2026 -- and all of it needs training data. We provide ML-ready files in CSV, JSON, FHIR R4, and Parquet.
Secondary Market
Healthcare Students
Medical coding (CPC/CCS/CCA), billing, RCM, HIM, nursing, pharmacy, and informatics students who need realistic patient records for practice and certification prep. Paired with discipline-specific workbooks and answer keys.
Tertiary Market
Administrators & Vendors
Healthcare administrators, EHR vendors, AI startups, clinical research teams, and public health analysts who need IRB-free, HIPAA-free synthetic patient data on demand for testing, demos, and development.
What We're Not
Clear boundaries, clear product.
We get asked this often enough that it's worth spelling out explicitly.
Not a coding school
We don't teach medical coding. We sell the patient records that coding students practice on. The data is the product.
Not a billing course
We don't run a billing curriculum. We provide the synthetic claims, EOBs, and remittance data that billing students use for hands-on practice.
How the Data Is Made
Built by The Generator. 100% synthetic.
Every patient record is AI-generated from scratch -- not sampled, not anonymized, not derived from real patients. The result is clinically realistic data that is HIPAA-free by design.
Dedicated Infrastructure
The Generator runs on a Spark DSX NAS with 11TB of storage. Records are generated continuously and validated through a multi-agent QA pipeline before release.
Zero Real Patient Data
Every name, diagnosis, lab result, and billing record is fictional. HIPAA's Privacy Rule, Security Rule, and Breach Notification Rule do not apply. No IRB. No DUA. No BAA. Buy and download.
Clinically Realistic
Records pass clinical plausibility checks: diagnoses map to appropriate procedures, lab values fall within realistic ranges, medication dosages match conditions, and billing codes align with documentation.
87 fields per record
Every patient record includes complete clinical and financial data across these categories:
Three Product Lines
Data. Workbooks. Instructor resources.
The datasets are the foundation. Everything else builds on top of them.
Core Product
Synthetic Patient Datasets
Downloadable bundles of complete patient records with clinical and financial data. Available in CSV, JSON, FHIR R4, and Parquet. ML-ready out of the box.
Education
Student Workbooks
11 discipline-specific workbooks with homework assignments, practice exercises, case studies, and answer keys. Data science, medical coding, billing, RCM, HIM, nursing, pharmacy, and more.
For Instructors
Instructor Resources
Answer keys, auto-grading scripts, rubric templates, syllabus templates, and semester planning guides. Professor-only materials sold separately from student workbooks.
Get in Touch
Questions? We're here.
Whether you need a custom dataset, have a question about our workbooks, or want to discuss enterprise pricing -- drop us a line.
support@patientdatasets.com →