Evidence Criteria Used to Evaluate Resources on the Better Care Playbook

Why develop evidence criteria for the Playbook?

Many users of the Better Care Playbook will want to know how strong the evidence is for complex care models and tools featured on the Playbook before deciding to test or adopt them. To address this need, all resources proposed for inclusion on the Playbook have been reviewed against the evidence criteria below, and only credible or promising resources are included. Each resource — including formal trials and research studies, case studies, perspectives, and reviews — are assigned an evidence level.

What are the levels of evidence?

STRONG EVIDENCE

Randomized Control Trial (RCT) - Concept tested in rigorous, randomized effectiveness studies — including cluster randomized controlled trials and randomized step wedge, factorial, or quasi-experimental designs. Studies with these designs offer very strong evidence and should be ready to implement with local adaptation.
Non-Randomized Trial with Comparison Group - Concept tested in effectiveness studies with comparison groups — including non-randomized trials, before-after studies, interrupted time series studies, and repeated measures studies. May be ready to implement with local adaptation.
Meta-Analysis of Multiple Studies - One or more meta-analyses of studies of an intervention showing effectiveness. These studies offer very strong evidence and should be ready to implement with local adaptation.
Systematic Review of Multiple Studies (with evidence grading) - A systematic review of studies, giving a level of evidence for each study included. Likely ready to implement with local adaptation.

MODERATE EVIDENCE

Rigorous Observational Study - Concept demonstrated to be effective in one or more rigorous observational studies with comparison groups and appropriate adjustment for bias and confounding. May be ready to implement with local adaptation.
Systematic Review of Multiple Studies (without evidence grading) - A systematic review of studies, not giving a level of evidence for each study included. May be ready to implement with local adaptation.

PROMISING EVIDENCE

Case Study - One or more rigorous real-world case studies. Promising idea ready for further testing and small-scale implementation with local adaptation.
Non-Systematic Review of Multiple Studies (with evidence grading) - A non-systematic review of studies, with evidence grading. Promising ideas ready for further testing and small-scale implementation with local adaptation.

EXPERT OPINION

Individual Report, Commentary, or Perspective - Expert or group of experts who, based on their experience, present a new idea, concept or model, which has not been formally evaluated. Promising idea needing further testing or research.
Non-Systematic Review of Studies of Multiple Studies (without evidence grading) - Non-systematic review of studies without evidence grading. Promising Ideas needing further testing or research.

VARYING

Toolkits and Guides – These formats often contain multiple resources that have differing levels of evidence.

Note, for all levels of evidence, adaptation for local context will be important. Additional background regarding approaches to testing, adapting, and spreading interventions can be consulted.

How was this criteria developed?

A range of systems for evaluating the strength of evidence were reviewed, including the Grading of Recommendations Assessment, Development and Evaluation working group and the Cochrane Effective Practice and Organisation of Care review group. To the extent possible, resources are also evaluated according to sound epidemiological principles, including contemporary adaptations of Bradford-Hill’s criteria for assessing causation.

Some systems would not meet the needs of Playbook users, so an adaptation of the logic of EPOC was used to ensure rigor, but embracing case studies, reviews, and other resources that predominate in the field of improving care for patients with complex needs. For example, credible comparison groups (whether randomized or not) were sought to assess: (1) secular trend; (2) time-ordered data display and analysis with clear timing of interventions; (3) plausibility and consistency with theory; (4) magnitude of the claimed effect; (5) dose-response; and (6) generalizability across diverse contexts. Particular attention was paid to potential sources of bias, as recommended by EPOC and where appropriate, adjustment for confounding. For all resources, adaptation for local contexts and conditions will be necessary.