Publicatie

The inter-rater agreement of retrospective assessment of adverse events does not improve with two reviewers.

Zegers, M., Bruijne, M.C. de, Wagner, C., Groenewegen, P.P., Wal, G. van der, Vet, H.C.W. de. The inter-rater agreement of retrospective assessment of adverse events does not improve with two reviewers.: , 2008.
Methods: In the Dutch Adverse Event Study, 4272 records of discharged or deceased patients in 2004 from 21 Dutch hospitals were independently reviewed by two physicians (physician A+B) for the determination of adverse events (AEs) and their degree of preventability. In case of disagreement between the two physicians about the presence of an AE and/or preventability, they discussed both reviews in a consensus procedure to reach consensus. When they reached no agreement, a third trained reviewer gave a final judgement based on information of the first two reviews. A reliability study was conducted to evaluate the inter-rater agreement of the patient record review process with two physicians per record and a consensus procedure. The objective was to examine the inter-rater agreement within pairs of physicians (physician A versus B; before consensus procedure) and the inter-rater agreement of the complete record review process, including the consensus procedure and a third review if applicable. The latter was determined by the inter-rater agreement between pairs of physicians (physician A+B versus C+D) for a sample of 119 patient records. Results: The inter-rater agreement within pairs of physicians was substantial for the determination of AEs and their preventability. The inter-rater agreement between pairs of physicians was fair for the determination of AEs and their preventability. Physician A and physician B separately found 592 and 621 AEs before consensus procedure. After discussion and reconsideration of their reviews in a consensus procedure more AEs were found (n=663). Of all detected AEs 46% were found after consensus procedure. The inter-rater agreement within pairs of physicians was higher for records of discharged patients (? = 0.68, 95% CI 0.62-0.73) compared to records of deceased patients (? = 0.62, 95% CI 0.58-0.67) and was higher for records that were reviewed by two physicians who reviewed many records (? = 0.68, 95% CI 0.64-0.73) compared to records reviewed by physicians who reviewed less records (? = 0.63, 95% CI 0.51-0.75). Conclusions: A record review process on the occurrence of AEs with two physicians per record and a consensus procedure is not more reliable than a record review process with one physician. We hypothesized that the involvement of two physicians per record and a consensus procedure in case of disagreement between their reviews would improve the reliability of the review process to assess AEs. However, the inter-rater agreement of the complete medical review process (inter-rater agreement between pairs of physicians), including the consensus procedure, was only fair, although the inter-rater agreement within pairs of physicians was substantial. A consensus procedure between physicians did improve reliability within pairs of physicians, but not between pairs of physicians. However, with an independent review of records by two physicians and a consensus procedure more AEs were detected than by one physician. Further improvement of reliability of patient record review to identify AEs is necessary for monitoring incidence of AEs in hospitals and hospital departments over time at a national level. A more explicit method based on specified and detailed checklists (using standards) for specific departments or patient groups may offer a solution. The team of physicians could be extended with more kind of specialties, e.g. cardiologists and neurosurgeons and a more intensive training and better standardisation of the process may help. The number of reviewers should be reduced in order to increase the experience (records) per reviewer. (aut. ref.)