Constructing data-derived family histories using electronic health records from a single healthcare delivery system

Copyright © The Author(s) 2019. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Associated Data

ckz152_Supplementary_Data. GUID: 95E8F7EC-2858-42E7-839A-CBCBC1437378

Abstract

Background

In order to examine the potential clinical value of integrating family history information directly from the electronic health records of patients’ family members, the electronic health records of individuals in Clalit Health Services, the largest payer/provider in Israel, were linked with the records of their parents.

Methods

We describe the results of a novel approach for creating data-derived family history information for 2 599 575 individuals, focusing on three chronic diseases: asthma, cardiovascular disease (CVD) and diabetes.

Results

In our cohort, there were 256 598 patients with asthma, 55 309 patients with CVD and 66 324 patients with diabetes. Of the people with asthma, CVD or diabetes, the percentage that also had a family history of the same disease was 22.0%, 70.8% and 70.5%, respectively.

Conclusions

Linking individuals’ health records with their data-derived family history has untapped potential for supporting diagnostic and clinical decision-making.

Introduction

Individuals with a family history of certain prevalent chronic diseases have an increased risk of developing those and other chronic and acute conditions, accounting for a significant proportion of health care costs, morbidity and mortality in the USA. 1–6 Until now, documentation of family history in routine clinical practice has been considered patient-reported, and therefore dependent upon a patient volunteering such information, without being able to validate the completeness and accuracy. Clinicians are limited in their capacity to obtain and include the presence or absence of family history in their decision-making process due to multiple factors, including time pressures during the clinical encounter, 7–9 the possibility that patients have incomplete or unreliable knowledge of their own family history 7 , 8 , 10–14 and the limited existence of technologies for the collection and integration of family history into medical record taking. 8 , 15–22

In many instances, documentation of family history in electronic medical records is missing despite the known impact of family history on future risk. 23 , 24 Furthermore, some family histories are recorded only after the patient has already been diagnosed with a disease, diminishing their value for future disease risk assessment or primary prevention. 25–27

In order to overcome these significant inherent limitations of obtaining accurate family history data during routine clinical care, the goal of this study is to identify and report family history information obtained by an alternative ‘data-derived’ method: extracting family history directly from the electronic health records of the patients’ family members. In this study, we automatically and anonymously link electronic health records of our cohort with those of their parents. Thus, we are effectively expanding the patient’s medical record to include diagnoses provided by a physician to a family member instead of relying on the recall and knowledge of the patient themselves. By creating and assessing this link, we can assess the validity of such a method and its contribution to both our understanding of disease and the potential impact on clinical practice when clinicians are assessing the risk of disease in their differential diagnosis of a patient. The linkage created here is the first step in validating the effects of integrating data-derived family history into clinical practice or population management. This untapped potential can be obtained by expanding patient records to include the documented diseases of relatives; we believe this is the necessary first step to allow such information to improve diagnoses, accelerate treatment decisions and inform preventive care practices for future disease outcomes.

In this study, we leverage the database captured by Clalit Health Services (Clalit), which maintains a comprehensive clinical record, including inpatient and outpatient data, of patients and any family members who are also members of Clalit. We evaluate the potential strength of association if one were to expand the patients records to include the diseases documented in their parents’ records. To our knowledge, this is the first study to describe the application of evaluating a system-wide ‘data-derived’ family history, using detailed clinical information that is extracted directly from the electronic medical documented diagnoses of a patient’s family members.

Methods

Setting

We conducted a large cross-sectional retrospective study with data taken from Clalit’s comprehensive database. Clalit is a health fund that acts both as a payer and a provider for its members, including a full range of comprehensive services (inpatient, outpatient and specialty) as mandated in the ‘basket of services’ for all four health funds according to the National Healthcare Insurance Law in Israel.

Clalit is the largest of four payer/provider health care delivery systems in Israel, with Clalit currently serving over 4.3 million members—over half of the Israeli population. All residents of Israel must belong to one of the four health funds, and as Clalit is the oldest and largest, with less than 2% attrition each year, many extended families have longitudinal follow-up in the Clalit system. 28 All members of Clalit are registered using their unique national identification number, which is used by the Ministry of Health and the Ministry of Interior to track and link birth records with the national identification numbers of the parents. The Clalit database is updated monthly according to these records. As such, Clalit’s data warehouse, which stores complete and comprehensive electronic health records for all members, has comprehensive clinical and demographic information on its entire patient population.

Design

We first describe the entire patient population who were members in Clalit for at least one year (or from birth if less than one year of age) as of 1 January 2017 (index date). For this study, we then create a cohort for whom both parents were also members of Clalit for at least one year between and including the years 2002 and 2016, thus providing optimal opportunity to document the potential data-derived family history, when present in the medical record of the parents. By limiting the study cohort to those with parents in Clalit, we can differentiate a ‘negative’ medical history from an ‘unknown’ medical history.

We describe those with three common chronic diseases: asthma, cardiovascular disease (CVD) and diabetes. We chose three of the most common diseases in order to best capture the significance of any familial association. Asthma is predominantly a childhood condition, thus most patients have both parents in Clalit. CVD and diabetes are conditions associated with high morbidity and mortality rates among adults, yet these conditions are affected by modifiable risk factors and have the potential for prevention or early detection; making these ideal for a future predictive model. We also wanted to take advantage of our existing validated algorithm for identifying patients with diabetes using electronic medical records. 29

Asthma is defined using the diagnosis of asthma, either as a free-text diagnosis or with the International Classification of Disease, 9th edition code of 493% (where ‘%’ indicates any sequence of numbers). CVD is similarly defined using at least one of the following International Classification of Disease, 9th edition codes: 00.24; 00.4 [0–8]%; 36%; 41 [0–4]%; 433%; 434%; 436%; 437 [0–1]%; 440%; 88.5 [0–7]% or their associated text. The internally validated chronic disease registry of Clalit was also utilized in defining CVD. Diabetes is defined in the Clalit database using a validated algorithm based on diagnoses, laboratory values and medication dispensing, whose methods have been published elsewhere. 29

To describe the bias that age may have on cohort (namely, that older individuals are less likely to have parents alive in the healthcare system), we present those with the disease condition and family history of disease condition by age in single and 5-year intervals.

For both the entire patient population and the cohort for this study, we describe the basic socio-demographic characteristics including age in years (0–24, 25–49, 50–74 and 75 or older), sex (male or female), ethnicity (based on the individual's grandparents' country of birth), the average socioeconomic status of those attending the same clinic (low, medium or high), district (there are nine commonly used districts, including: Eilat, Dan-Petakh Tikvah, South, Haifa, Jerusalem, Center, North, Sharon-Shomron, or Tel Aviv), residence (rural or urban, as well as central or periphery, as defined by the National Census Bureau), and years from first diagnosis to the index date (

Then, for each disease condition, we present those with family history for the disease condition in the cohort, as well as those with both the disease and a family history for the disease.

Finally, to gauge the face validity of electronic health record-driven family history, we present the sensitivity, specificity, positive predictive value and negative predictive value for each of the disease conditions.

The Clalit Institutional Review Board approved this study.

Results

There were 4 331 310 members in Clalit's patient population as of 1 January 2017 ( tables 1–3 ). There were 399 758 (9.2%) patients with asthma, 411 053 (9.5%) with CVD and 409 763 (9.5%) with diabetes. Among those meeting our cohort criteria of having data-derived family history available, there were 2 599 575 (60%) patients in total, of which 256 598 (9.9%) had asthma, 55 309 (2.1%) had CVD and 66 324 (2.6%) had diabetes (Supplementary figure S1). The distribution by 5-year age groups is shown in Supplementary table S2. While there is clearly a shift towards the younger age group among the study cohort as compared with the Clalit population, the percentages with disease within each age group are quite similar.

Table 1

The demographic characteristics of Clalit population and the study cohort, including subpopulations of those with asthma and those with a family history of asthma, 1 January 2017

VariableClalit populationPopulation with two parents in Clalit
TotalAsthmaTotalAsthmaFamily history of asthma
N n (%) N n (%) n (% of total)
Total4 331 310399 758 (9.2%)2 599 575256 598 (9.9%)369 054 (14.2%)
Age (years)
0–241 767 859207 827 (11.8%)1 431 770172 756 (12.1%)171 932 (12.0%)
25–491 375 40394 492 (6.9%)999 32672 665 (7.3%)162 615 (16.3%)
50–74911 31569 580 (7.6%)168 17411 147 (6.6%)34 448 (20.5%)
75+272 87327 446 (10.1%)16113 (8.1%)23 (14.3%)
Missing3860413 (10.7%)14417 (11.8%)36 (25.0%)
Sex
Female2 212 886192 216 (8.7%)1 287 390110 780 (8.6%)182 496 (14.2%)
Male2 118 423207 542 (9.8%)1 312 185145 818 (11.1%)186 558 (14.2%)
Ethnicity (grandparent land of birth)
Eastern European/Americas622 34946 528 (7.5%)211 37419 598 (9.3%)25 266 (12.0%)
Middle Eastern (non-Jewish)1 157 86094 392 (8.2%)835 72069 765 (8.3%)109 750 (13.1%)
North African/Middle Eastern (Jewish)634 23759 480 (9.4%)349 63633 522 (9.6%)57 325 (16.4%)
Other1 236 148140 084 (11.3%)915 291105 687 (11.5%)137 307 (15.0%)
Unknown680 71659 274 (8.7%)287 55428 026 (9.7%)39 406 (13.7%)
Ethnicity (clinic catchment area)
Jewish secular2 951 667287 514 (9.7%)1 619 484174 621 (10.8%)242 714 (15.0%)
Jewish orthodox219 04117 513 (8.0%)143 18412 104 (8.5%)16 318 (11.4%)
Arab1 155 43594 377 (8.2%)835 93569 855 (8.4%)109 889 (13.1%)
Missing5167354 (6.9%)97218 (1.9%)133 (13.7%)
SES
Low1 498 413128 397 (8.6%)1 039 28691 801 (8.8%)139 709 (13.4%)
Medium1 479 541141 955 (9.6%)810 98885 726 (10.6%)120 927 (14.9%)
High1 279 480122 562 (9.6%)701 74774 254 (10.6%)102 317 (14.6%)
Missing73 8766844 (9.3%)47 5544 817 (10.1%)6101 (12.8%)
District
Eilat29 7312935 (9.9%)16 0211 744 (10.9%)2690 (16.8%)
Dan-Petakh Tikvah449 97940 583 (9.0%)257 60324 833 (9.6%)36 666 (14.2%)
South575 39951 067 (8.9%)370 15235 159 (9.5%)49 829 (13.5%)
Haifa726 25258 450 (8.0%)437 30037 731 (8.6%)56 457 (12.9%)
Jerusalem496 48835 890 (7.2%)298 03520 485 (6.9%)40 452 (13.6%)
Center560 76158 741 (10.5%)323 34636 716 (11.4%)51 022 (15.8%)
North541 99952 719 (9.7%)363 39637 449 (10.3%)48 891 (13.5%)
Sharon-Shomron639 41669 319 (10.8%)386 78146 459 (12.0%)60 868 (15.7%)
Tel Aviv305 01629 698 (9.7%)145 53516 003 (11.0%)22 016 (15.1%)
Unknown11022 (0.2%)4341 (0.2%)30 (6.9%)
Missing5 167354 (6.9%)97218 (1.9%)133 (13.7%)
Residence (urban vs. rural)
Urban3 846 111356 718 (9.3%)2 283 824227 050 (9.9%)325 431 (14.2%)
Rural467 63841 808 (8.9%)307 98128 999 (9.4%)42 744 (13.9%)
Missing17 5611232 (7.0%)7770549 (7.1%)879 (11.3%)
Residence (central vs. peripheral)
Center3 073 206292 347 (9.5%)1 760 239181 394 (10.3%)259 055 (14.7%)
Periphery664 08653 677 (8.1%)446 56137 698 (8.4%)55 908 (12.5%)
Missing594 01853 734 (9.0%)392 77537 506 (9.5%)54 091 (13.8%)
Years with disease
15 901 (4.0%) 10 068 (3.9%)
1–4 65 710 (16.4%) 43 131 (16.8%)
5–9 95 610 (23.9%) 65 978 (25.7%)
10+ 222 537 (55.7%) 137 421 (53.6%)

SES, socio-economic status.

Table 2

The demographic characteristics of Clalit population and the study cohort, including the sub-populations of those with CVD and those with a family history of CVD, 1 January 2017

VariableClalit populationPopulation with two parents in Clalit
TotalCVDTotalCVDFamily history of CVD
N n (%) N n (%) n (% of total)
Total4 331 310411 053 (9.5%)2 599 57555 309 (2.1%)821 187 (31.6%)
Age (years)
0–241 767 8599610 (0.5%)1 431 7708004 (0.6%)172 097 (12.0%)
25–491 375 40336 588 (2.7%)999 32622 955 (2.3%)507 251 (50.8%)
50–74911 315214 784 (23.6%)168 17424 256 (14.4%)141 618 (84.2%)
75+272 873147 814 (54.2%)16157 (35.4%)130 (80.7%)
Missing38602257 (58.5%)14437 (25.7%)91 (63.2%)
Sex
Female2 212 886183 633 (8.3%)1 287 39021 379 (1.7%)407 927 (31.7%)
Male2 118 423227 420 (10.7%)1 312 18533 930 (2.6%)413 260 (31.5%)
Ethnicity (grandparent land of birth)
Eastern European/Americas622 349118 849 (19.1%)211 3746957 (3.3%)80 695 (38.2%)
Middle Eastern (non-Jewish)1 157 86067 537 (5.8%)835 72016 674 (2.0%)257 053 (30.8%)
North African/Middle Eastern (Jewish)634 237113 351 (17.9%)349 63615 874 (4.5%)181 758 (52.0%)
Other1 236 14830 335 (2.5%)915 29110 737 (1.2%)216 470 (23.7%)
Unknown680 716113 351 (16.7%)287 5545 067 (1.8%)85 211 (29.6%)
Ethnicity (clinic catchment area)
Jewish secular2 951 667314 198 (10.6%)1 619 48437 039 (1.7%)534 795 (33.0%)
Jewish orthodox219 04110 883 (5.0%)143 1841556 (2.6%)28 972 (20.2%)
Arab1 155 43567 126 (5.8%)835 93516 676 (3.5%)257 306 (30.8%)
Missing51671812 (35.1%)97238 (4.5%)114 (11.7%)
SES
Low1 498 41395 452 (6.4%)1 039 28619 615 (1.7%)305 245 (29.4%)
Medium1 479 541166 885 (11.3%)810 98818 501 (2.6%)260 422 (32.1%)
High1 279 480143 066 (11.2%)701 74716 214 (3.5%)242 053 (34.5%)
Missing73 8765650 (7.6%)47 554979 (4.5%)13 467 (28.3%)
District
Eilat29 7312926 (9.8%)16 021468 (1.7%)5920 (37.0%)
Dan-Petakh Tikvah449 97944 687 (9.9%)257 6035297 (2.6%)8167 (3.2%)
South575 39946 775 (8.1%)370 1527298 (3.5%)102 840 (27.8%)
Haifa726 25273 538 (10.1%)437 30010 134 (4.5%)144 209 (33.0%)
Jerusalem496 48837 396 (7.5%)298 0355424 (5.4%)83 440 (28.0%)
Center560 76155 015 (9.8%)323 3467145 (6.3%)99 654 (30.8%)
North541 99946 471 (8.6%)363 3967525 (7.2%)117 665 (32.4%)
Sharon-Shomron639 41661 921 (9.7%)386 7818791 (8.1%)131 913 (34.1%)
Tel Aviv305 01640 512 (13.3%)145 5353189 (9.0%)53 656 (36.9%)
Unknown11020 (0.0%)43414 (9.9%)9 (2.1%)
Missing51672434 (47.1%)97238 (10.8%)114 (11.7%)
Residence (urban vs. rural)
Urban3 846 111372 433 (9.7%)2 283 82449 261 (9.0%)727 931 (31.9%)
Rural467 63836 186 (7.7%)307 9815897 (9.9%)91 484 (29.7%)
Missing17 5612434 (13.9%)7770151 (10.8%)1772 (22.8%)
Residence (central vs. peripheral)
Center3 073 206314 344 (10.2%)1 760 23938 572 (9.0%)567 227 (32.2%)
Periphery664 08652 268 (7.9%)446 5619338 (9.9%)139 118 (31.2%)
Missing594 01844 441 (7.5%)392 7757399 (10.8%)114 842 (29.2%)
Years with disease
26 955 (6.6%) 5555 (10.0%)
1–4 96 084 (23.4%) 18 305 (33.1%)
5–9 94 622 (23.0%) 14 116 (25.5%)
10+ 163 500 (39.8%) 10 865 (19.6%)

CVD, cardiovascular disease; SES, socio-economic status.

Table 3

The demographic characteristics of Clalit population and the study cohort, including the sub-populations of those with diabetes and those with a family history of diabetes, 1 January 2017

VariableClalit populationPopulation with two parents in Clalit
TotalDiabetesTotalDiabetesFamily history of Diabetes
N n (%) N n (%) n (% of total)
Total4 331 310409 763 (9.5%)2 599 57566 324 (2.6%)809 667 (31.1%)
Age (years)
0–241 767 85912 877 (0.7%)1 431 77010 852 (0.8%)200 583 (14.0%)
25–491 375 40345 841 (3.3%)999 32627 090 (2.7%)498 363 (49.9%)
50–74911 315239 031 (26.2%)168 17428 282 (16.8%)110 572 (65.7%)
75+272 873110 303 (40.4%)16153 (32.9%)66 (41.0%)
Missing38601711 (44.3%)14447 (32.6%)83 (57.6%)
Sex
Female2 212 886205 160 (9.3%)1 287 39028 063 (2.2%)401 419 (31.2%)
Male2 118 423204 603 (9.7%)1 312 18538 261 (2.9%)408 248 (31.1%)
Ethnicity (grandparent land of birth)
Eastern European/Americas622 349100 491 (16.1%)211 3748038 (3.8%)69 124 (32.7%)
Middle Eastern (non-Jewish)1 157 86090 031 (7.8%)835 72022 699 (2.7%)293 989 (35.2%)
North African/Middle Eastern (Jewish)634 237106 112 (16.7%)349 63616 984 (4.9%)159 256 (45.5%)
Other1 236 14836 986 (3.0%)915 29112 616 (1.4%)209 836 (22.9%)
Unknown680 71676 143 (11.2%)287 5545987 (4.4%)77 462 (26.9%)
Ethnicity (clinic catchment area)
Jewish secular2 951 667307 010 (10.4%)1 619 48441 240 (2.5%)483 672 (29.9%)
Jewish orthodox219 04111 678 (5.3%)143 1842327 (1.6%)31 453 (22.0%)
Arab1 155 43589 583 (7.8%)835 93522 708 (2.7%)294 429 (35.2%)
Missing51671492 (28.9%)97249 (5.0%)113 (11.6%)
SES
Low1 498 413119 343 (8.0%)1 039 28627 090 (2.6%)343 691 (33.1%)
Medium1 479 541161 813 (10.9%)810 98821 045 (2.6%)248 000 (30.6%)
High1 279 480123 443 (9.6%)701 74717 195 (2.5%)206 145 (29.4%)
Missing73 8765164 (7.0%)47 554994 (2.1%)11 831 (24.9%)
District
Eilat29 7312661 (9.0%)16 021458 (2.9%)4965 (31.0%)
Dan-Petakh Tikvah449 97944 842 (10.0%)257 6037417 (2.9%)77 825 (30.2%)
South575 39949 449 (8.6%)370 1529624 (2.6%)107 089 (28.9%)
Haifa726 25272 831 (10.0%)437 30011 330 (2.6%)141 514 (32.4%)
Jerusalem496 48837 061 (7.5%)298 0355528 (1.9%)88 234 (29.6%)
Center560 76154 359 (9.7%)323 3468105 (2.5%)95 510 (29.5%)
North541 99947 959 (8.8%)363 3969145 (2.5%)118 618 (32.6%)
Sharon-Shomron639 41664 201 (10.0%)386 78111 322 (2.9%)129 147 (33.4%)
Tel Aviv305 01634 908 (11.4%)145 5353346 (2.3%)46 634 (32.0%)
Unknown11020 (0.0%)4340 (0.0%)18 (4.1%)
Missing51671492 (28.9%)97249 (5.0%)113 (11.6%)
Residence (urban vs. rural)
Urban3 846 111373 970 (9.7%)2 283 82459 313 (2.6%)728 985 (31.9%)
Rural467 63833 725 (7.2%)307 9816829 (2.2%)82 363 (26.7%)
Missing17 5612068 (11.8%)7770182 (2.3%)1566 (20.2%)
Residence (central vs. peripheral)
Center3 073 206311 414 (10.1%)1 760 23946 391 (2.6%)559 851 (31.8%)
Periphery664 08656 096 (8.4%)446 56111 259 (2.5%)144 144 (32.3%)
Missing594 01842 253 (7.1%)392 7758674 (2.2%)105 672 (26.9%)
Years with disease
16 663 (4.1%) 4266 (6.4%)
1–4 171 067 (41.7%) 16 363 (24.7%)
5–9 75 279 (18.4%) 15 713 (23.7%)
10+ 103 581 (25.3%) 15 967 (24.1%)

SES, socio-economic status.

In the study cohort, 12.1% of people aged 0–24 had asthma, compared with 8.1% among those aged 75 or more. For CVD, 0.8% of people aged 0–24 in the study cohort had the disease, compared with 32.9% among those aged 75 or more. For diabetes, 0.7% of people aged 0–24 in the study cohort had the disease, compared with 40.4% among those aged 75 or more.

Overall, 14.2% of the people in the study cohort had at least one parent with a recorded diagnosis of asthma. There was a higher rate of disease among males in the study cohort (11.1% in males vs. 8.6% in females) but a similar rate of having a family history of asthma among both sexes (14.2% for both males and females). Those aged 50–74 had the highest percentages of family history of asthma (20.5%) and the lowest percentages of disease (6.6%). Other socio-demographic characteristics were similar between the groups. Overall, as a predictor of asthma, a family history of asthma had a sensitivity of 22.0%, specificity of 13.3%, positive predictive value of 15.3% and negative predictive value of 91.0% ( table 4 ). Additionally, regardless of age, there were more patients with asthma without a family history than with a family history of asthma (Supplementary figure S3).

Table 4

The sensitivity, specificity, positive predictive value, and negative predictive value of having a disease given that they have a data-derived family history of that disease, 1 January 2017

Family HistoryIndex Patient
Asthma
YesNo
AsthmaYes56 444312 610Positive predictive value15.3%
No200 1542 030 367Negative predictive value91.0%
SensitivitySpecificity
22.0%13.3%
CVD
YesNo
CVDYes39 163782 024Positive predictive value4.8%
No16 1461 762 242Negative predictive value99.1%
SensitivitySpecificity
70.8%30.7%
Diabetes
YesNo
DiabetesYes46 776762 891Positive predictive value5.8%
No19 5481 770 360Negative predictive value98.9%
SensitivitySpecificity
70.5%30.1%

CVD, cardiovascular disease.

In our cohort, 31.6% had a data-derived family history of CVD. CVD was more common among males than among females (2.6 vs. 1.7%), but not among those with family history (31.5 and 31.7%). Those of North African/Middle Eastern origin had the highest percentages of disease; 52.0% had a family history of disease, with 4.5% having the disease. Those of Eastern European/Americas origin had lower rates of family history of disease (38.2%) and CVD itself (3.3%), but among those with CVD, a family history of disease was more common (78.2% as compared with 67.7% for those of Middle Eastern origin). Among those with CVD, the number of those having a positive family history was greater than those actually having the disease starting at age 33 and peaking at age 59 (Supplementary figure S4). Overall, as a predictor of CVD, a family history of CVD had a sensitivity of 70.8%, specificity of 30.7%, positive predictive value of 4.8%, and a negative predictive value of 99.1%.

Finally, 31.1% of the cohort had a family history of diabetes. There was a male predominance among those with the disease (2.9 vs. 2.2%), but not among those with family history (31.1 vs. 31.2%). There was also a higher predominance of family history among those of North African/Middle Eastern origin (45.5%), and among those living in urban areas as opposed to rural areas (31.9 vs. 26.7%). Among patients with diabetes, having a family history of diabetes was more common than having no history of disease beginning at age 23, peaking at age 56 (Supplementary figure S5).

Similar to CVD, those of Eastern European/Americas origin had lower rates of disease and family history, but among those with diabetes, had similar rates of having a positive family history of disease. Having a family history of diabetes had a sensitivity of 70.5%, specificity of 30.1%, positive predictive value of 5.8% and negative predictive value of 98.9%.

Discussion

In this study, we integrated EHRs, claims and demographic-based data-derived family trees on over half of Clalit’s population. We demonstrated through this integrated approach that family history of asthma, diabetes, and CVD had a strong association with the presence of each disease, respectively, that varied by age and ethnic subgroup. We found that the predictive value of family history varied somewhat by disease, age and ethnic subgroup, but overall had a strong association with the presence of a given disease condition.

Our EHR-based data-derived approach yielded similar results as those attained by traditional studies that use detailed prospective family history data collection, as described below. However, unlike traditional family history studies, which require labour-intensive data collection, our data-derived approach can be easily applied on a population-scale, relying only on existing clinical data. This opens up the door for much greater availability and utilization of family history information in clinical practice.

As an example of some of these traditional studies, Meigs et al. studied the impact of family history on diabetes using The Framingham Offspring Study, 30 finding that among patients 50 years and older who had diabetes, there were 28% cases where both parents had diabetes, 19% where only the father had diabetes and 24% where only the mother had diabetes. One important difference to note is that the aforementioned study included a total of 29 cases in which both parents had diabetes, while our cohort included a total of 30 205 such cases.

Another relevant study that also utilized the Framingham data was by Lloyd-Jones et al., who analysed the correlations between parental premature CVD and non-premature CVD and their offspring risk for CVD. 31 In this study, the odds-ratios for CVD were relatively low. One possible explanation is our use of a wider case definition for CVD than that used by Lloyd-Jones et al. Other studies that employ parental health data for the study of CVD includes the works of Vik et al. on data from the HUNT study in Norway, 32 Allport et al. on the age of onset of CVD based on the Framingham data and a study by Khaleghi et al. on peripheral arterial disease. 33

Studies that explore the correlation between parental atopic or asthmatic background and their children’s risk of developing asthma include Xu et al., who used national survey data to find an HR of 3.71 for maternal history of asthma on offspring’s risk to develop asthma. 34 Similar results were also obtained by Litonjua et al. 35 yet this study was among children (the median age of the children in this study was 3.5). Similar results were also obtained in a meta-analysis of studies on parental risk factor for asthma. 36 These studies have clearly established that having a family history is associated with risk of future disease among children, but not necessarily the risk in children as they reach adulthood and beyond.

Our results highlight the importance of incorporating family medical history into the clinical decision-making process, especially in relation to diabetes and CVD. As can be seen in Supplementary figures S4 and S5, most cases of CVD or diabetes among patients older than 30 years old were preceded by the same diagnosis for at least one of the patient’s parents. Lack of parental history of diabetes or CVD had a negative predictive value of about 99% for the existence of the same condition for the offspring, despite the fact that these conditions are fairly common in the adult population. Hence, an automatic extraction of family history data could play an important role as a screening tool for these conditions. To note, these results were not replicated among patients with asthma. We suggest that this finding may be due to the fact that asthma is a heterogeneous disease particularly in its combination of childhood asthma and adult onset asthma.

This study has several limitations. The design outlined by this study may not be applicable to other health care systems with EMRs in which connections between family members have yet to be introduced. That being said, by highlighting the importance of such information, care providers and insurers might be encouraged to actively include such data into their systems, thus allowing automatic screening tools to include family history data.

Second, this is a retrospective database study, and thus the derivation of the case definition is based on diagnostic codes that were not always used as intended, and in some cases could be missing completely. To mitigate these limitations, we either built upon case definitions that were validated in previous studies or re-validated each case definition manually.

Lastly, our inclusion criteria are neither fully representative of the general population nor does our population necessarily represent the global population. As the goal of this study was to prove the feasibility of extracting one’s family medical history from the EMR, we focused only on the 60% of patients (n = 2.6 million) whose parents were also listed in the database. Thus, our study cohort was biased towards younger (and potentially healthier) patients as information on the most elderly patients’ parents was more often not available (e.g. CVD in our cohort was only 2.2 vs. 9.0% in Clalit). These differences were mitigated when comparing most age-groups (e.g. CVD at ages 50–54 was 10.5% in our cohort vs. 11.5% in Clalit). However, it is noteworthy that the population aged 75 and greater in our study cohort is significantly smaller and with lower rates of disease than those in Clalit (as seen in table 1 ). Thus, the results in this groups should be interpreted with caution; the utility of predicting disease by using data-derived family history would be limited. The thrust of future research efforts is mostly likely to benefit from focusing on earlier age groups e.g. between ages 30 and 60 for CVD and 25 and 60 for diabetes, when having a family history begins to predominate (as seen in the Supplementary figures). Future studies in the coming decades can include data of those aged 60 and greater, which can further elucidate the impact of family history in this age group.

Creating an EMR with family linkages is both feasible and useful, helping to guide screening efforts and enabling the practice of more personalized medical care. We have shown that having a positive family history of disease such as asthma, CVD or diabetes is highly likely to co-exist with the same condition for the offspring. Further studies are needed to evaluate the clinical impact or opportunities of integrating such a linked EMR.

Funding

This work was supported by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG009129. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Conflicts of interest: None declared.

Key points

Linking electronic medical records can create a rich database of disease family histories.

The predictive value of a data-derived family history varies by disease, age and ethnic subgroup, and is strongly associated with disease condition.

Data-derived family history can be readily incorporated into the clinical decision-making, especially at younger ages.