Statistical methods for evaluating diagnostic test agreement and reproducibility

Authors

  • Édgar Cortés-Reyes
  • Jorge Andrés Rubio-Romero
  • Hernando Gaitán-Duarte

DOI:

https://doi.org/10.18597/rcog.271

Keywords:

reproducibility of results, correlation, agreement, concordance

Abstract

Introduction: when evaluating a diagnostic test’s usefulness, one often has to assess the results’ repeatability or their degree of agreement when compared to another test which is not used as gold standard for the entity in question. This paper was aimed at presenting the statistical methods used for evaluating clinical and laboratory observations’ repeatability or reproducibility and agreement, their theoretical basis and showing some examples of how they have been applied.

Methodology: the theoretical bases for evaluating agreement and the repeatability of results were reviewed and examples of their use were taken from pertinent obstetrics- and gynecology-related literature.

Results: the Kappa coefficient is usually used for evaluating the degree of agreement or concordance for dichotomic or categorical variables. The use of the intraclass correlation coefficient (ICC) or Lin’s concordance correlation coefficient should be preferred over Pearson’s correlation coefficient or paired Student’s t-test for assessing continuous variables’ concordance. These methods must be interpreted according to the clinical context in which they were used.

Conclusions: the selection of statistical methods for evaluating agreement and reproducibility depends on the type of variable being measured and on the parameters being evaluated for assessing either reproducibility or validity.

Author Biographies

Édgar Cortés-Reyes

Profesor Asociado, Departamento del Movimiento Corporal Humano, Instituto de Investigaciones Clínicas, Facultad de Medicina, Universidad Nacional de Colombia. Bogotá (Colombia).

Jorge Andrés Rubio-Romero

Profesor Asociado, Departamento de Obstetricia y Ginecología, Instituto de Investigaciones Clínicas, Facultad de Medicina, Universidad Nacional de Colombia. Bogotá (Colombia).

Hernando Gaitán-Duarte

Profesor Titular, Departamento de Obstetricia y Ginecología, Instituto de Investigaciones Clínicas, Facultad de Medicina, Universidad Nacional de Colombia. Bogotá (Colombia).

References

Gaitán-Duarte H, Rubio-Romero J, Gómez-Chantraine M. Interpretación del desempeño operativo de las pruebas de tamizaje y de diagnóstico de enfermedades en obstetricia y ginecología. Rev Colomb Obstet Ginecol 2009;60:365-76.

Alcázar JL, Mercé LT, Manero MG, Bau S, López-García G. Endometrial volume and vascularity measurements by transvaginal 3-dimensional ultrasonography and power Doppler angiography in stimulated and tumoral endometria: an interobserver reproducibility study. J Ultrasound Med 2005;24:1091-8.

Monsonego J, Pollini G, Evrard MJ, Sednaoui P, Monfort L, Zerat L, et al. Detection of human papillomavirus genotypes among high-risk women: a comparison of hybrid capture and linear array tests. Sex Transm Dis 2008;35:521-7.

Didacterion, Diccionario latín-español. [Sitio en Internet]. Visitado 2010 Mar 8. Disponible en: http://recursos.cnice.mec.es/latingriego/Palladium/5_aps/diclat.php

Cortés-Reyes, E. Comparación en la estimación del VO2max a través de un monitor de frecuencia cardíaca Polar S810 y una prueba de esfuerzo maximal en banda sin fin según el protocolo de Balke, en deportistas universitarios entrenados en resistencia aeróbica en la ciudad de Bogotá, D.C. Tesis de Maestría en Epidemiología Clínica, Universidad nacional de Colombia; 2008

Kramer MS, Feinstein AR. Clinical biostatistics. LIV. The biostatistics of concordance. Clin Pharmacol Ther 198129:111-23.

Fernández P, Díaz P. La fiabilidad de las mediciones clínicas: el análisis de concordancia para variables numéricas. [Sitio en Internet]. Visitado 2010 Jul 6. Disponible en: http://www.fisterra.com/mbe/investiga/conc_numerica/conc_numerica.pdf

Cortés-Reyes E, Echeverry-Raad J, Mancera-Soto E, Ramos-Caballero D. Concordancia en la estimación del consumo máximo de oxígeno entre una prueba de esfuerzo y el Polar S810. Rev salud pública 2009;11:819-27.

van Randen A, Laméris W, Nio CY, Spijkerboer AM, Meier MA, Tutein Nolthenius C, et al. Inter-observer agreement for abdominal CT in unselected patients with acute abdominal pain. Eur Radiol 2009;19:1394-407.

Cepeda M, Perez A. En: Ruiz M, Gómez C, Londoño D. Investigación Clínica: Epidemiología clínica aplicada. Centro Editorial Javeriano; 2001. p. 288-301.

Massad LS, Jeronimo J, Schiffman M; National Institutes of Health/American Society for Colposcopy and Cervical Pathology (nIH/ASCCP) Research Group. Interobserver agreement in the assessment of components of colposcopic grading. Obstet Gynecol 2008;111:1279-84.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74.

Altman DG. Practical statistics for medical research. new York: Chapman and Hall/CRC; 1991. p. 277-300.

Cohen J. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968;70:213-20.

Faustin D, Gutiérrez L, Gintautas J, Calame RJ. Clinical assessment of gestational age: a comparison of two methods. J natl Med Assoc 1991;83:425-9.

Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307-10.

Bland JM, Altman DG. A note on the used of the intraclass correlation in the evaluation of agreement betwen two methods of measurement. Comput Biol Med 1990;20:337-40.

Fleiss JL. The design and analysis of clinical experiments. new York: Wiley; 1986.

Prieto L, Lamarca R, Casado A. Assessment of the reliability of clinical findings: the intraclass correlation coefficient. Med Clin (Barc) 1998;110:142-5.

Kruger JA, Heap SW, Murphy BA, Dietz HP. Pelvic floor function in nulliparous women using threedimensional ultrasound and magnetic resonance imaging. Obstet Gynecol 2008;111:631-8.

Coeficiente de correlación simple por rangos de Kendall [Sitio en Internet] Visitado 2010 Jun 25. Disponible en: http://www.ray-design.com.mx/psicoparaest/index.php?option=com_content&view=article&id=254:coeficiente-kendall1&catid=54:coeficientecorrela&Itemid=75

Lin L. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989;45:255-68.

Cepeda MS, Africano JM, Polo R, Alcalá R, Carr D. Agreement between percentage pain reductions calculated from numeric rating scores of pain intensity and those reported by patients with acute or cancer pain. Pain 2003;106:439-42.

Zar JH. Biostatistical Analisis. Third edition. Upper Saddle River, NJ, USA: Prentice-Hall, Inc.; 1996.

NIWA, National Institute of Water & Atmospheric Research. Taihoro nukurangi. [Sitio en Internet]. Visitado 2010 Jul 6. Disponible en: http://www.niwascience.co.nz/services/free/statistical/concordance.

Rubio-Romero JA, Gaitán-Duarte HG, Rodríguez-Malagón N. Concordancia entre la estimación visual y la medición del volumen recolectado en un bolsa del sangrado intraparto en mujeres con parto normal en Bogotá, Colombia, 2006. Rev Colomb Obstet Ginecol 2008;59:92-102.

Carrasco JL, Jover L, King TS, Chinchilli VM. Comparison of concordance correlation coefficient estimating approaches with skewed data. J Biopharm Stat 2007;17:673-84.

Bland JM, Altman DG. Measurements error and correlation coefficients. BMJ 1996;313:41-2.

How to Cite

1.
Cortés-Reyes Édgar, Rubio-Romero JA, Gaitán-Duarte H. Statistical methods for evaluating diagnostic test agreement and reproducibility. Rev. colomb. obstet. ginecol. [Internet]. 2010 Sep. 30 [cited 2024 May 17];61(3):247-55. Available from: https://revista.fecolsog.org/index.php/rcog/article/view/271

Downloads

Download data is not yet available.

Published

2010-09-30

Issue

Section

Medical Education
QR Code

Altmetric

Article metrics
Abstract views
Galley vies
PDF Views
HTML views
Other views