Copyright © The Author(s) 2019. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.
This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
In order to examine the potential clinical value of integrating family history information directly from the electronic health records of patients’ family members, the electronic health records of individuals in Clalit Health Services, the largest payer/provider in Israel, were linked with the records of their parents.
We describe the results of a novel approach for creating data-derived family history information for 2 599 575 individuals, focusing on three chronic diseases: asthma, cardiovascular disease (CVD) and diabetes.
In our cohort, there were 256 598 patients with asthma, 55 309 patients with CVD and 66 324 patients with diabetes. Of the people with asthma, CVD or diabetes, the percentage that also had a family history of the same disease was 22.0%, 70.8% and 70.5%, respectively.
Linking individuals’ health records with their data-derived family history has untapped potential for supporting diagnostic and clinical decision-making.
Individuals with a family history of certain prevalent chronic diseases have an increased risk of developing those and other chronic and acute conditions, accounting for a significant proportion of health care costs, morbidity and mortality in the USA. 1–6 Until now, documentation of family history in routine clinical practice has been considered patient-reported, and therefore dependent upon a patient volunteering such information, without being able to validate the completeness and accuracy. Clinicians are limited in their capacity to obtain and include the presence or absence of family history in their decision-making process due to multiple factors, including time pressures during the clinical encounter, 7–9 the possibility that patients have incomplete or unreliable knowledge of their own family history 7 , 8 , 10–14 and the limited existence of technologies for the collection and integration of family history into medical record taking. 8 , 15–22
In many instances, documentation of family history in electronic medical records is missing despite the known impact of family history on future risk. 23 , 24 Furthermore, some family histories are recorded only after the patient has already been diagnosed with a disease, diminishing their value for future disease risk assessment or primary prevention. 25–27
In order to overcome these significant inherent limitations of obtaining accurate family history data during routine clinical care, the goal of this study is to identify and report family history information obtained by an alternative ‘data-derived’ method: extracting family history directly from the electronic health records of the patients’ family members. In this study, we automatically and anonymously link electronic health records of our cohort with those of their parents. Thus, we are effectively expanding the patient’s medical record to include diagnoses provided by a physician to a family member instead of relying on the recall and knowledge of the patient themselves. By creating and assessing this link, we can assess the validity of such a method and its contribution to both our understanding of disease and the potential impact on clinical practice when clinicians are assessing the risk of disease in their differential diagnosis of a patient. The linkage created here is the first step in validating the effects of integrating data-derived family history into clinical practice or population management. This untapped potential can be obtained by expanding patient records to include the documented diseases of relatives; we believe this is the necessary first step to allow such information to improve diagnoses, accelerate treatment decisions and inform preventive care practices for future disease outcomes.
In this study, we leverage the database captured by Clalit Health Services (Clalit), which maintains a comprehensive clinical record, including inpatient and outpatient data, of patients and any family members who are also members of Clalit. We evaluate the potential strength of association if one were to expand the patients records to include the diseases documented in their parents’ records. To our knowledge, this is the first study to describe the application of evaluating a system-wide ‘data-derived’ family history, using detailed clinical information that is extracted directly from the electronic medical documented diagnoses of a patient’s family members.
We conducted a large cross-sectional retrospective study with data taken from Clalit’s comprehensive database. Clalit is a health fund that acts both as a payer and a provider for its members, including a full range of comprehensive services (inpatient, outpatient and specialty) as mandated in the ‘basket of services’ for all four health funds according to the National Healthcare Insurance Law in Israel.
Clalit is the largest of four payer/provider health care delivery systems in Israel, with Clalit currently serving over 4.3 million members—over half of the Israeli population. All residents of Israel must belong to one of the four health funds, and as Clalit is the oldest and largest, with less than 2% attrition each year, many extended families have longitudinal follow-up in the Clalit system. 28 All members of Clalit are registered using their unique national identification number, which is used by the Ministry of Health and the Ministry of Interior to track and link birth records with the national identification numbers of the parents. The Clalit database is updated monthly according to these records. As such, Clalit’s data warehouse, which stores complete and comprehensive electronic health records for all members, has comprehensive clinical and demographic information on its entire patient population.
We first describe the entire patient population who were members in Clalit for at least one year (or from birth if less than one year of age) as of 1 January 2017 (index date). For this study, we then create a cohort for whom both parents were also members of Clalit for at least one year between and including the years 2002 and 2016, thus providing optimal opportunity to document the potential data-derived family history, when present in the medical record of the parents. By limiting the study cohort to those with parents in Clalit, we can differentiate a ‘negative’ medical history from an ‘unknown’ medical history.
We describe those with three common chronic diseases: asthma, cardiovascular disease (CVD) and diabetes. We chose three of the most common diseases in order to best capture the significance of any familial association. Asthma is predominantly a childhood condition, thus most patients have both parents in Clalit. CVD and diabetes are conditions associated with high morbidity and mortality rates among adults, yet these conditions are affected by modifiable risk factors and have the potential for prevention or early detection; making these ideal for a future predictive model. We also wanted to take advantage of our existing validated algorithm for identifying patients with diabetes using electronic medical records. 29
Asthma is defined using the diagnosis of asthma, either as a free-text diagnosis or with the International Classification of Disease, 9th edition code of 493% (where ‘%’ indicates any sequence of numbers). CVD is similarly defined using at least one of the following International Classification of Disease, 9th edition codes: 00.24; 00.4 [0–8]%; 36%; 41 [0–4]%; 433%; 434%; 436%; 437 [0–1]%; 440%; 88.5 [0–7]% or their associated text. The internally validated chronic disease registry of Clalit was also utilized in defining CVD. Diabetes is defined in the Clalit database using a validated algorithm based on diagnoses, laboratory values and medication dispensing, whose methods have been published elsewhere. 29
To describe the bias that age may have on cohort (namely, that older individuals are less likely to have parents alive in the healthcare system), we present those with the disease condition and family history of disease condition by age in single and 5-year intervals.
For both the entire patient population and the cohort for this study, we describe the basic socio-demographic characteristics including age in years (0–24, 25–49, 50–74 and 75 or older), sex (male or female), ethnicity (based on the individual's grandparents' country of birth), the average socioeconomic status of those attending the same clinic (low, medium or high), district (there are nine commonly used districts, including: Eilat, Dan-Petakh Tikvah, South, Haifa, Jerusalem, Center, North, Sharon-Shomron, or Tel Aviv), residence (rural or urban, as well as central or periphery, as defined by the National Census Bureau), and years from first diagnosis to the index date (
Then, for each disease condition, we present those with family history for the disease condition in the cohort, as well as those with both the disease and a family history for the disease.
Finally, to gauge the face validity of electronic health record-driven family history, we present the sensitivity, specificity, positive predictive value and negative predictive value for each of the disease conditions.
The Clalit Institutional Review Board approved this study.
There were 4 331 310 members in Clalit's patient population as of 1 January 2017 ( tables 1–3 ). There were 399 758 (9.2%) patients with asthma, 411 053 (9.5%) with CVD and 409 763 (9.5%) with diabetes. Among those meeting our cohort criteria of having data-derived family history available, there were 2 599 575 (60%) patients in total, of which 256 598 (9.9%) had asthma, 55 309 (2.1%) had CVD and 66 324 (2.6%) had diabetes (Supplementary figure S1). The distribution by 5-year age groups is shown in Supplementary table S2. While there is clearly a shift towards the younger age group among the study cohort as compared with the Clalit population, the percentages with disease within each age group are quite similar.
The demographic characteristics of Clalit population and the study cohort, including subpopulations of those with asthma and those with a family history of asthma, 1 January 2017
Variable | Clalit population | Population with two parents in Clalit | |||
---|---|---|---|---|---|
Total | Asthma | Total | Asthma | Family history of asthma | |
N | n (%) | N | n (%) | n (% of total) | |
Total | 4 331 310 | 399 758 (9.2%) | 2 599 575 | 256 598 (9.9%) | 369 054 (14.2%) |
Age (years) | |||||
0–24 | 1 767 859 | 207 827 (11.8%) | 1 431 770 | 172 756 (12.1%) | 171 932 (12.0%) |
25–49 | 1 375 403 | 94 492 (6.9%) | 999 326 | 72 665 (7.3%) | 162 615 (16.3%) |
50–74 | 911 315 | 69 580 (7.6%) | 168 174 | 11 147 (6.6%) | 34 448 (20.5%) |
75+ | 272 873 | 27 446 (10.1%) | 161 | 13 (8.1%) | 23 (14.3%) |
Missing | 3860 | 413 (10.7%) | 144 | 17 (11.8%) | 36 (25.0%) |
Sex | |||||
Female | 2 212 886 | 192 216 (8.7%) | 1 287 390 | 110 780 (8.6%) | 182 496 (14.2%) |
Male | 2 118 423 | 207 542 (9.8%) | 1 312 185 | 145 818 (11.1%) | 186 558 (14.2%) |
Ethnicity (grandparent land of birth) | |||||
Eastern European/Americas | 622 349 | 46 528 (7.5%) | 211 374 | 19 598 (9.3%) | 25 266 (12.0%) |
Middle Eastern (non-Jewish) | 1 157 860 | 94 392 (8.2%) | 835 720 | 69 765 (8.3%) | 109 750 (13.1%) |
North African/Middle Eastern (Jewish) | 634 237 | 59 480 (9.4%) | 349 636 | 33 522 (9.6%) | 57 325 (16.4%) |
Other | 1 236 148 | 140 084 (11.3%) | 915 291 | 105 687 (11.5%) | 137 307 (15.0%) |
Unknown | 680 716 | 59 274 (8.7%) | 287 554 | 28 026 (9.7%) | 39 406 (13.7%) |
Ethnicity (clinic catchment area) | |||||
Jewish secular | 2 951 667 | 287 514 (9.7%) | 1 619 484 | 174 621 (10.8%) | 242 714 (15.0%) |
Jewish orthodox | 219 041 | 17 513 (8.0%) | 143 184 | 12 104 (8.5%) | 16 318 (11.4%) |
Arab | 1 155 435 | 94 377 (8.2%) | 835 935 | 69 855 (8.4%) | 109 889 (13.1%) |
Missing | 5167 | 354 (6.9%) | 972 | 18 (1.9%) | 133 (13.7%) |
SES | |||||
Low | 1 498 413 | 128 397 (8.6%) | 1 039 286 | 91 801 (8.8%) | 139 709 (13.4%) |
Medium | 1 479 541 | 141 955 (9.6%) | 810 988 | 85 726 (10.6%) | 120 927 (14.9%) |
High | 1 279 480 | 122 562 (9.6%) | 701 747 | 74 254 (10.6%) | 102 317 (14.6%) |
Missing | 73 876 | 6844 (9.3%) | 47 554 | 4 817 (10.1%) | 6101 (12.8%) |
District | |||||
Eilat | 29 731 | 2935 (9.9%) | 16 021 | 1 744 (10.9%) | 2690 (16.8%) |
Dan-Petakh Tikvah | 449 979 | 40 583 (9.0%) | 257 603 | 24 833 (9.6%) | 36 666 (14.2%) |
South | 575 399 | 51 067 (8.9%) | 370 152 | 35 159 (9.5%) | 49 829 (13.5%) |
Haifa | 726 252 | 58 450 (8.0%) | 437 300 | 37 731 (8.6%) | 56 457 (12.9%) |
Jerusalem | 496 488 | 35 890 (7.2%) | 298 035 | 20 485 (6.9%) | 40 452 (13.6%) |
Center | 560 761 | 58 741 (10.5%) | 323 346 | 36 716 (11.4%) | 51 022 (15.8%) |
North | 541 999 | 52 719 (9.7%) | 363 396 | 37 449 (10.3%) | 48 891 (13.5%) |
Sharon-Shomron | 639 416 | 69 319 (10.8%) | 386 781 | 46 459 (12.0%) | 60 868 (15.7%) |
Tel Aviv | 305 016 | 29 698 (9.7%) | 145 535 | 16 003 (11.0%) | 22 016 (15.1%) |
Unknown | 1102 | 2 (0.2%) | 434 | 1 (0.2%) | 30 (6.9%) |
Missing | 5 167 | 354 (6.9%) | 972 | 18 (1.9%) | 133 (13.7%) |
Residence (urban vs. rural) | |||||
Urban | 3 846 111 | 356 718 (9.3%) | 2 283 824 | 227 050 (9.9%) | 325 431 (14.2%) |
Rural | 467 638 | 41 808 (8.9%) | 307 981 | 28 999 (9.4%) | 42 744 (13.9%) |
Missing | 17 561 | 1232 (7.0%) | 7770 | 549 (7.1%) | 879 (11.3%) |
Residence (central vs. peripheral) | |||||
Center | 3 073 206 | 292 347 (9.5%) | 1 760 239 | 181 394 (10.3%) | 259 055 (14.7%) |
Periphery | 664 086 | 53 677 (8.1%) | 446 561 | 37 698 (8.4%) | 55 908 (12.5%) |
Missing | 594 018 | 53 734 (9.0%) | 392 775 | 37 506 (9.5%) | 54 091 (13.8%) |
Years with disease | |||||
15 901 (4.0%) | 10 068 (3.9%) | ||||
1–4 | 65 710 (16.4%) | 43 131 (16.8%) | |||
5–9 | 95 610 (23.9%) | 65 978 (25.7%) | |||
10+ | 222 537 (55.7%) | 137 421 (53.6%) |
SES, socio-economic status.
The demographic characteristics of Clalit population and the study cohort, including the sub-populations of those with CVD and those with a family history of CVD, 1 January 2017
Variable | Clalit population | Population with two parents in Clalit | |||
---|---|---|---|---|---|
Total | CVD | Total | CVD | Family history of CVD | |
N | n (%) | N | n (%) | n (% of total) | |
Total | 4 331 310 | 411 053 (9.5%) | 2 599 575 | 55 309 (2.1%) | 821 187 (31.6%) |
Age (years) | |||||
0–24 | 1 767 859 | 9610 (0.5%) | 1 431 770 | 8004 (0.6%) | 172 097 (12.0%) |
25–49 | 1 375 403 | 36 588 (2.7%) | 999 326 | 22 955 (2.3%) | 507 251 (50.8%) |
50–74 | 911 315 | 214 784 (23.6%) | 168 174 | 24 256 (14.4%) | 141 618 (84.2%) |
75+ | 272 873 | 147 814 (54.2%) | 161 | 57 (35.4%) | 130 (80.7%) |
Missing | 3860 | 2257 (58.5%) | 144 | 37 (25.7%) | 91 (63.2%) |
Sex | |||||
Female | 2 212 886 | 183 633 (8.3%) | 1 287 390 | 21 379 (1.7%) | 407 927 (31.7%) |
Male | 2 118 423 | 227 420 (10.7%) | 1 312 185 | 33 930 (2.6%) | 413 260 (31.5%) |
Ethnicity (grandparent land of birth) | |||||
Eastern European/Americas | 622 349 | 118 849 (19.1%) | 211 374 | 6957 (3.3%) | 80 695 (38.2%) |
Middle Eastern (non-Jewish) | 1 157 860 | 67 537 (5.8%) | 835 720 | 16 674 (2.0%) | 257 053 (30.8%) |
North African/Middle Eastern (Jewish) | 634 237 | 113 351 (17.9%) | 349 636 | 15 874 (4.5%) | 181 758 (52.0%) |
Other | 1 236 148 | 30 335 (2.5%) | 915 291 | 10 737 (1.2%) | 216 470 (23.7%) |
Unknown | 680 716 | 113 351 (16.7%) | 287 554 | 5 067 (1.8%) | 85 211 (29.6%) |
Ethnicity (clinic catchment area) | |||||
Jewish secular | 2 951 667 | 314 198 (10.6%) | 1 619 484 | 37 039 (1.7%) | 534 795 (33.0%) |
Jewish orthodox | 219 041 | 10 883 (5.0%) | 143 184 | 1556 (2.6%) | 28 972 (20.2%) |
Arab | 1 155 435 | 67 126 (5.8%) | 835 935 | 16 676 (3.5%) | 257 306 (30.8%) |
Missing | 5167 | 1812 (35.1%) | 972 | 38 (4.5%) | 114 (11.7%) |
SES | |||||
Low | 1 498 413 | 95 452 (6.4%) | 1 039 286 | 19 615 (1.7%) | 305 245 (29.4%) |
Medium | 1 479 541 | 166 885 (11.3%) | 810 988 | 18 501 (2.6%) | 260 422 (32.1%) |
High | 1 279 480 | 143 066 (11.2%) | 701 747 | 16 214 (3.5%) | 242 053 (34.5%) |
Missing | 73 876 | 5650 (7.6%) | 47 554 | 979 (4.5%) | 13 467 (28.3%) |
District | |||||
Eilat | 29 731 | 2926 (9.8%) | 16 021 | 468 (1.7%) | 5920 (37.0%) |
Dan-Petakh Tikvah | 449 979 | 44 687 (9.9%) | 257 603 | 5297 (2.6%) | 8167 (3.2%) |
South | 575 399 | 46 775 (8.1%) | 370 152 | 7298 (3.5%) | 102 840 (27.8%) |
Haifa | 726 252 | 73 538 (10.1%) | 437 300 | 10 134 (4.5%) | 144 209 (33.0%) |
Jerusalem | 496 488 | 37 396 (7.5%) | 298 035 | 5424 (5.4%) | 83 440 (28.0%) |
Center | 560 761 | 55 015 (9.8%) | 323 346 | 7145 (6.3%) | 99 654 (30.8%) |
North | 541 999 | 46 471 (8.6%) | 363 396 | 7525 (7.2%) | 117 665 (32.4%) |
Sharon-Shomron | 639 416 | 61 921 (9.7%) | 386 781 | 8791 (8.1%) | 131 913 (34.1%) |
Tel Aviv | 305 016 | 40 512 (13.3%) | 145 535 | 3189 (9.0%) | 53 656 (36.9%) |
Unknown | 1102 | 0 (0.0%) | 434 | 14 (9.9%) | 9 (2.1%) |
Missing | 5167 | 2434 (47.1%) | 972 | 38 (10.8%) | 114 (11.7%) |
Residence (urban vs. rural) | |||||
Urban | 3 846 111 | 372 433 (9.7%) | 2 283 824 | 49 261 (9.0%) | 727 931 (31.9%) |
Rural | 467 638 | 36 186 (7.7%) | 307 981 | 5897 (9.9%) | 91 484 (29.7%) |
Missing | 17 561 | 2434 (13.9%) | 7770 | 151 (10.8%) | 1772 (22.8%) |
Residence (central vs. peripheral) | |||||
Center | 3 073 206 | 314 344 (10.2%) | 1 760 239 | 38 572 (9.0%) | 567 227 (32.2%) |
Periphery | 664 086 | 52 268 (7.9%) | 446 561 | 9338 (9.9%) | 139 118 (31.2%) |
Missing | 594 018 | 44 441 (7.5%) | 392 775 | 7399 (10.8%) | 114 842 (29.2%) |
Years with disease | |||||
26 955 (6.6%) | 5555 (10.0%) | ||||
1–4 | 96 084 (23.4%) | 18 305 (33.1%) | |||
5–9 | 94 622 (23.0%) | 14 116 (25.5%) | |||
10+ | 163 500 (39.8%) | 10 865 (19.6%) |
CVD, cardiovascular disease; SES, socio-economic status.
The demographic characteristics of Clalit population and the study cohort, including the sub-populations of those with diabetes and those with a family history of diabetes, 1 January 2017
Variable | Clalit population | Population with two parents in Clalit | |||
---|---|---|---|---|---|
Total | Diabetes | Total | Diabetes | Family history of Diabetes | |
N | n (%) | N | n (%) | n (% of total) | |
Total | 4 331 310 | 409 763 (9.5%) | 2 599 575 | 66 324 (2.6%) | 809 667 (31.1%) |
Age (years) | |||||
0–24 | 1 767 859 | 12 877 (0.7%) | 1 431 770 | 10 852 (0.8%) | 200 583 (14.0%) |
25–49 | 1 375 403 | 45 841 (3.3%) | 999 326 | 27 090 (2.7%) | 498 363 (49.9%) |
50–74 | 911 315 | 239 031 (26.2%) | 168 174 | 28 282 (16.8%) | 110 572 (65.7%) |
75+ | 272 873 | 110 303 (40.4%) | 161 | 53 (32.9%) | 66 (41.0%) |
Missing | 3860 | 1711 (44.3%) | 144 | 47 (32.6%) | 83 (57.6%) |
Sex | |||||
Female | 2 212 886 | 205 160 (9.3%) | 1 287 390 | 28 063 (2.2%) | 401 419 (31.2%) |
Male | 2 118 423 | 204 603 (9.7%) | 1 312 185 | 38 261 (2.9%) | 408 248 (31.1%) |
Ethnicity (grandparent land of birth) | |||||
Eastern European/Americas | 622 349 | 100 491 (16.1%) | 211 374 | 8038 (3.8%) | 69 124 (32.7%) |
Middle Eastern (non-Jewish) | 1 157 860 | 90 031 (7.8%) | 835 720 | 22 699 (2.7%) | 293 989 (35.2%) |
North African/Middle Eastern (Jewish) | 634 237 | 106 112 (16.7%) | 349 636 | 16 984 (4.9%) | 159 256 (45.5%) |
Other | 1 236 148 | 36 986 (3.0%) | 915 291 | 12 616 (1.4%) | 209 836 (22.9%) |
Unknown | 680 716 | 76 143 (11.2%) | 287 554 | 5987 (4.4%) | 77 462 (26.9%) |
Ethnicity (clinic catchment area) | |||||
Jewish secular | 2 951 667 | 307 010 (10.4%) | 1 619 484 | 41 240 (2.5%) | 483 672 (29.9%) |
Jewish orthodox | 219 041 | 11 678 (5.3%) | 143 184 | 2327 (1.6%) | 31 453 (22.0%) |
Arab | 1 155 435 | 89 583 (7.8%) | 835 935 | 22 708 (2.7%) | 294 429 (35.2%) |
Missing | 5167 | 1492 (28.9%) | 972 | 49 (5.0%) | 113 (11.6%) |
SES | |||||
Low | 1 498 413 | 119 343 (8.0%) | 1 039 286 | 27 090 (2.6%) | 343 691 (33.1%) |
Medium | 1 479 541 | 161 813 (10.9%) | 810 988 | 21 045 (2.6%) | 248 000 (30.6%) |
High | 1 279 480 | 123 443 (9.6%) | 701 747 | 17 195 (2.5%) | 206 145 (29.4%) |
Missing | 73 876 | 5164 (7.0%) | 47 554 | 994 (2.1%) | 11 831 (24.9%) |
District | |||||
Eilat | 29 731 | 2661 (9.0%) | 16 021 | 458 (2.9%) | 4965 (31.0%) |
Dan-Petakh Tikvah | 449 979 | 44 842 (10.0%) | 257 603 | 7417 (2.9%) | 77 825 (30.2%) |
South | 575 399 | 49 449 (8.6%) | 370 152 | 9624 (2.6%) | 107 089 (28.9%) |
Haifa | 726 252 | 72 831 (10.0%) | 437 300 | 11 330 (2.6%) | 141 514 (32.4%) |
Jerusalem | 496 488 | 37 061 (7.5%) | 298 035 | 5528 (1.9%) | 88 234 (29.6%) |
Center | 560 761 | 54 359 (9.7%) | 323 346 | 8105 (2.5%) | 95 510 (29.5%) |
North | 541 999 | 47 959 (8.8%) | 363 396 | 9145 (2.5%) | 118 618 (32.6%) |
Sharon-Shomron | 639 416 | 64 201 (10.0%) | 386 781 | 11 322 (2.9%) | 129 147 (33.4%) |
Tel Aviv | 305 016 | 34 908 (11.4%) | 145 535 | 3346 (2.3%) | 46 634 (32.0%) |
Unknown | 1102 | 0 (0.0%) | 434 | 0 (0.0%) | 18 (4.1%) |
Missing | 5167 | 1492 (28.9%) | 972 | 49 (5.0%) | 113 (11.6%) |
Residence (urban vs. rural) | |||||
Urban | 3 846 111 | 373 970 (9.7%) | 2 283 824 | 59 313 (2.6%) | 728 985 (31.9%) |
Rural | 467 638 | 33 725 (7.2%) | 307 981 | 6829 (2.2%) | 82 363 (26.7%) |
Missing | 17 561 | 2068 (11.8%) | 7770 | 182 (2.3%) | 1566 (20.2%) |
Residence (central vs. peripheral) | |||||
Center | 3 073 206 | 311 414 (10.1%) | 1 760 239 | 46 391 (2.6%) | 559 851 (31.8%) |
Periphery | 664 086 | 56 096 (8.4%) | 446 561 | 11 259 (2.5%) | 144 144 (32.3%) |
Missing | 594 018 | 42 253 (7.1%) | 392 775 | 8674 (2.2%) | 105 672 (26.9%) |
Years with disease | |||||
16 663 (4.1%) | 4266 (6.4%) | ||||
1–4 | 171 067 (41.7%) | 16 363 (24.7%) | |||
5–9 | 75 279 (18.4%) | 15 713 (23.7%) | |||
10+ | 103 581 (25.3%) | 15 967 (24.1%) |
SES, socio-economic status.
In the study cohort, 12.1% of people aged 0–24 had asthma, compared with 8.1% among those aged 75 or more. For CVD, 0.8% of people aged 0–24 in the study cohort had the disease, compared with 32.9% among those aged 75 or more. For diabetes, 0.7% of people aged 0–24 in the study cohort had the disease, compared with 40.4% among those aged 75 or more.
Overall, 14.2% of the people in the study cohort had at least one parent with a recorded diagnosis of asthma. There was a higher rate of disease among males in the study cohort (11.1% in males vs. 8.6% in females) but a similar rate of having a family history of asthma among both sexes (14.2% for both males and females). Those aged 50–74 had the highest percentages of family history of asthma (20.5%) and the lowest percentages of disease (6.6%). Other socio-demographic characteristics were similar between the groups. Overall, as a predictor of asthma, a family history of asthma had a sensitivity of 22.0%, specificity of 13.3%, positive predictive value of 15.3% and negative predictive value of 91.0% ( table 4 ). Additionally, regardless of age, there were more patients with asthma without a family history than with a family history of asthma (Supplementary figure S3).
The sensitivity, specificity, positive predictive value, and negative predictive value of having a disease given that they have a data-derived family history of that disease, 1 January 2017
Family History | Index Patient | ||||
---|---|---|---|---|---|
Asthma | |||||
Yes | No | ||||
Asthma | Yes | 56 444 | 312 610 | Positive predictive value | 15.3% |
No | 200 154 | 2 030 367 | Negative predictive value | 91.0% | |
Sensitivity | Specificity | ||||
22.0% | 13.3% | ||||
CVD | |||||
Yes | No | ||||
CVD | Yes | 39 163 | 782 024 | Positive predictive value | 4.8% |
No | 16 146 | 1 762 242 | Negative predictive value | 99.1% | |
Sensitivity | Specificity | ||||
70.8% | 30.7% | ||||
Diabetes | |||||
Yes | No | ||||
Diabetes | Yes | 46 776 | 762 891 | Positive predictive value | 5.8% |
No | 19 548 | 1 770 360 | Negative predictive value | 98.9% | |
Sensitivity | Specificity | ||||
70.5% | 30.1% |
CVD, cardiovascular disease.
In our cohort, 31.6% had a data-derived family history of CVD. CVD was more common among males than among females (2.6 vs. 1.7%), but not among those with family history (31.5 and 31.7%). Those of North African/Middle Eastern origin had the highest percentages of disease; 52.0% had a family history of disease, with 4.5% having the disease. Those of Eastern European/Americas origin had lower rates of family history of disease (38.2%) and CVD itself (3.3%), but among those with CVD, a family history of disease was more common (78.2% as compared with 67.7% for those of Middle Eastern origin). Among those with CVD, the number of those having a positive family history was greater than those actually having the disease starting at age 33 and peaking at age 59 (Supplementary figure S4). Overall, as a predictor of CVD, a family history of CVD had a sensitivity of 70.8%, specificity of 30.7%, positive predictive value of 4.8%, and a negative predictive value of 99.1%.
Finally, 31.1% of the cohort had a family history of diabetes. There was a male predominance among those with the disease (2.9 vs. 2.2%), but not among those with family history (31.1 vs. 31.2%). There was also a higher predominance of family history among those of North African/Middle Eastern origin (45.5%), and among those living in urban areas as opposed to rural areas (31.9 vs. 26.7%). Among patients with diabetes, having a family history of diabetes was more common than having no history of disease beginning at age 23, peaking at age 56 (Supplementary figure S5).
Similar to CVD, those of Eastern European/Americas origin had lower rates of disease and family history, but among those with diabetes, had similar rates of having a positive family history of disease. Having a family history of diabetes had a sensitivity of 70.5%, specificity of 30.1%, positive predictive value of 5.8% and negative predictive value of 98.9%.
In this study, we integrated EHRs, claims and demographic-based data-derived family trees on over half of Clalit’s population. We demonstrated through this integrated approach that family history of asthma, diabetes, and CVD had a strong association with the presence of each disease, respectively, that varied by age and ethnic subgroup. We found that the predictive value of family history varied somewhat by disease, age and ethnic subgroup, but overall had a strong association with the presence of a given disease condition.
Our EHR-based data-derived approach yielded similar results as those attained by traditional studies that use detailed prospective family history data collection, as described below. However, unlike traditional family history studies, which require labour-intensive data collection, our data-derived approach can be easily applied on a population-scale, relying only on existing clinical data. This opens up the door for much greater availability and utilization of family history information in clinical practice.
As an example of some of these traditional studies, Meigs et al. studied the impact of family history on diabetes using The Framingham Offspring Study, 30 finding that among patients 50 years and older who had diabetes, there were 28% cases where both parents had diabetes, 19% where only the father had diabetes and 24% where only the mother had diabetes. One important difference to note is that the aforementioned study included a total of 29 cases in which both parents had diabetes, while our cohort included a total of 30 205 such cases.
Another relevant study that also utilized the Framingham data was by Lloyd-Jones et al., who analysed the correlations between parental premature CVD and non-premature CVD and their offspring risk for CVD. 31 In this study, the odds-ratios for CVD were relatively low. One possible explanation is our use of a wider case definition for CVD than that used by Lloyd-Jones et al. Other studies that employ parental health data for the study of CVD includes the works of Vik et al. on data from the HUNT study in Norway, 32 Allport et al. on the age of onset of CVD based on the Framingham data and a study by Khaleghi et al. on peripheral arterial disease. 33
Studies that explore the correlation between parental atopic or asthmatic background and their children’s risk of developing asthma include Xu et al., who used national survey data to find an HR of 3.71 for maternal history of asthma on offspring’s risk to develop asthma. 34 Similar results were also obtained by Litonjua et al. 35 yet this study was among children (the median age of the children in this study was 3.5). Similar results were also obtained in a meta-analysis of studies on parental risk factor for asthma. 36 These studies have clearly established that having a family history is associated with risk of future disease among children, but not necessarily the risk in children as they reach adulthood and beyond.
Our results highlight the importance of incorporating family medical history into the clinical decision-making process, especially in relation to diabetes and CVD. As can be seen in Supplementary figures S4 and S5, most cases of CVD or diabetes among patients older than 30 years old were preceded by the same diagnosis for at least one of the patient’s parents. Lack of parental history of diabetes or CVD had a negative predictive value of about 99% for the existence of the same condition for the offspring, despite the fact that these conditions are fairly common in the adult population. Hence, an automatic extraction of family history data could play an important role as a screening tool for these conditions. To note, these results were not replicated among patients with asthma. We suggest that this finding may be due to the fact that asthma is a heterogeneous disease particularly in its combination of childhood asthma and adult onset asthma.
This study has several limitations. The design outlined by this study may not be applicable to other health care systems with EMRs in which connections between family members have yet to be introduced. That being said, by highlighting the importance of such information, care providers and insurers might be encouraged to actively include such data into their systems, thus allowing automatic screening tools to include family history data.
Second, this is a retrospective database study, and thus the derivation of the case definition is based on diagnostic codes that were not always used as intended, and in some cases could be missing completely. To mitigate these limitations, we either built upon case definitions that were validated in previous studies or re-validated each case definition manually.
Lastly, our inclusion criteria are neither fully representative of the general population nor does our population necessarily represent the global population. As the goal of this study was to prove the feasibility of extracting one’s family medical history from the EMR, we focused only on the 60% of patients (n = 2.6 million) whose parents were also listed in the database. Thus, our study cohort was biased towards younger (and potentially healthier) patients as information on the most elderly patients’ parents was more often not available (e.g. CVD in our cohort was only 2.2 vs. 9.0% in Clalit). These differences were mitigated when comparing most age-groups (e.g. CVD at ages 50–54 was 10.5% in our cohort vs. 11.5% in Clalit). However, it is noteworthy that the population aged 75 and greater in our study cohort is significantly smaller and with lower rates of disease than those in Clalit (as seen in table 1 ). Thus, the results in this groups should be interpreted with caution; the utility of predicting disease by using data-derived family history would be limited. The thrust of future research efforts is mostly likely to benefit from focusing on earlier age groups e.g. between ages 30 and 60 for CVD and 25 and 60 for diabetes, when having a family history begins to predominate (as seen in the Supplementary figures). Future studies in the coming decades can include data of those aged 60 and greater, which can further elucidate the impact of family history in this age group.
Creating an EMR with family linkages is both feasible and useful, helping to guide screening efforts and enabling the practice of more personalized medical care. We have shown that having a positive family history of disease such as asthma, CVD or diabetes is highly likely to co-exist with the same condition for the offspring. Further studies are needed to evaluate the clinical impact or opportunities of integrating such a linked EMR.
This work was supported by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG009129. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflicts of interest: None declared.
The predictive value of a data-derived family history varies by disease, age and ethnic subgroup, and is strongly associated with disease condition.
Data-derived family history can be readily incorporated into the clinical decision-making, especially at younger ages.