What is the best statistical test to calculate reproducibility in VFA reading in population-based cohort? A comparison between kappa of cohen and uniform kappa

Category Primary study
JournalSwiss Medical Weekly
Year 2013
Background: Gold standard to diagnose a Vertebral Fracture (VF) is X-ray. A new approach so called Vertebral Fracture Assessment (VFA) has been tested in clinical conditions. VFA seems to be adequate in term of reproducibility when compared with conventional X-rays in clinical situation. There is no evaluation of this method in screening population-based cohort. In all publications regarding reproducibility of VFA, the kappa test of Cohen is the most useful statistical test. Interpretation of kappa becomes precarious if class prevalence is extremely not uniform. This is the case in population-based cohort, where prevalence of the event is very low. To control it a new test of agreement has been recently proposed: the uniform kappa. Objective: We aimed to calculate reproducibility in VFA reading in a screening population-based cohort by 2 different statistical tests: kappa of Cohen and uniform kappa Method: We performed the reproducibility analysis on 360 OsteoLaus study patients randomly chosen. The OsteoLaus cohort concerns a sub population of women (50 to 80 yo) of the Lausanne cohort Co- Laus. VFA were analyzed between T4 and L4. Two independent readers have read the 360 VFA to test inter-reading reproducibility. We calculated Kappas regarding the dichotomies criteria: readable vertebrae yes/no, vertebral fracture yes/no, ranking No readable/VFyes/ VFno, for total VFA, dorsal spine and lumbar spine. We calculated Kappas for grade 0,1,2,3 and grouping grade (0 + 1, 2 + 3). We considered Landis and Koch values to interpret kappa of Cohen results (>0.81: excellent, 0.8-0.61: good, 0.6-0.21: moderate, 0.2-0: bad, <0: very bad). We estimated a good result of kappa uniform >0.75. Results: 12% of vertebrae were not readable. Prevalence of VF varied from 3% to 4% (fracture/no fracture) for all vertebrae with 3 to 4% grade 1 VF, 0.6 to 1.3% grade 2 VF and 0.03% to 0.2% grade 3 VF. Inter-reader reproducibility by Kappa of Cohen was moderate to good (0.35 to 0.72) and good (0.74 to 0.98) by Uniform Kappa for all criteria. Conclusions: VFA is well reproducible in clinical practice. In case of screening study, events are rare making the kappa of Cohen approach inappropriate in our opinion. We found that kappa of Cohen is considered as moderate. Uniform kappa is not influenced by the rate of events. We found that results of uniform kappa are high. In case of research/ evaluation of general population, Uniform kappa seems more accurate for reproducibility than kappa of Cohen.
Epistemonikos ID: d2e73c93cec5fdb00cb428bddc6f4f8343c4f15b
First added on: Feb 06, 2025