(6) Exploring Structural Uncertainty and Impact of Health State Utility Values on Lifetime Outcomes in Diabetes Economic Simulation Models: Findings from the Ninth Mount Hood Diabetes Quality-of-Life Challenge

Home

Questions

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

Papers

PMCID: 9329757 (link)

Year: 2022

Reviewer Paper ID: 6

Project Paper ID: 23

Q1 - Title

Question description: Does the title clearly identify the study as an economic evaluation and specify the interventions being compared?

Explanation: The title of the study, 'Exploring Structural Uncertainty and Impact of Health State Utility Values on Lifetime Outcomes in Diabetes Economic Simulation Models: Findings from the Ninth Mount Hood Diabetes Quality-of-Life Challenge,' does not clearly identify the study as an economic evaluation or specify the interventions being compared. Instead, it focuses on structural uncertainty and utility values in diabetes simulation models without mentioning comparative economic evaluations or specific interventions.

Quotes:

  • Title: 'Exploring Structural Uncertainty and Impact of Health State Utility Values on Lifetime Outcomes in Diabetes Economic Simulation Models: Findings from the Ninth Mount Hood Diabetes Quality-of-Life Challenge'

Q2 - Abstract

Question description: Does the abstract provide a structured summary that includes the context, key methods, results, and alternative analyses?

Explanation: The abstract provides a structured summary that includes context, methods, results, and alternative analyses. It mentions the context by discussing structural uncertainty in diabetes models, briefly describes the methods, reports the results of variability in incremental LYs and QALYs, and explores alternative analyses regarding health state utility values.

Quotes:

  • Background: Structural uncertainty can affect model-based economic simulation estimates and study conclusions.
  • Methods: Eleven types of diabetes simulation modeling groups participated in the 9th Mount Hood Diabetes Challenge. Modeling groups simulated 5 diabetes-related intervention profiles.
  • Results: Substantial cross-model variability in incremental LYs and QALYs was observed.
  • Conclusions: Variations in utility values contribute to a lesser extent than uncertainty captured as structural uncertainty. These findings reinforce the importance of assessing structural uncertainty thoroughly.

Q3 - Background and objectives

Question description: Does the introduction provide the context for the study, the study question, and its practical relevance for decision-making in policy or practice?

Explanation: The introduction of the manuscript indeed provides the necessary context for the study, outlines the study question, and discusses its relevance for decision-making in policy or practice. It explains the issue of structural uncertainty in economic models and the specific challenges with modeling type 2 diabetes, while highlighting the importance of using accurate models for policy decisions regarding alternative health interventions.

Quotes:

  • "Simulation models that are built to analyze the same problem often produce different results... These different choices lead to uncertainty in model results, known in the modeling community as structural or model uncertainty."
  • "Economic evaluations of antidiabetes treatments often show relatively small incremental differences in QALYs between treatment arms... Structural uncertainty is likely to be important, as small absolute variations in QALYs can have outsized impacts on incremental cost-effectiveness ratios."
  • "These models play an important role in allowing health care policy makers to make informed decisions when selecting between alternative health interventions."

Q4 - Health economic analysis plan

Question description: Was a health economic analysis plan developed, and if so, where is it available?

Explanation: The manuscript does not explicitly mention the development or availability of a health economic analysis plan. While the article discusses methods, standardized instructions, and challenges in modeling, there is no mention of a specific health economic analysis plan being developed or its availability.

Quotes:

  • The instructions for the Quality-of-Life Challenge can be found in Supplementary Material 1.
  • Participation in the challenge required submission in advance of results and model and methods documentation.
  • Supplemental Material: Supplementary material for this article is available on the Medical Decision Making website.

Q5 - Study population

Question description: Are the characteristics of the study population (e.g., age range, demographics, socioeconomic, or clinical characteristics) described?

Explanation: The manuscript describes the baseline characteristics of the study population, including age, duration of diabetes, smoking status, and various clinical measures like HbA1c, blood pressure, cholesterol levels, and presence of specific diseases or conditions. It also provides a table listing these characteristics, indicating they are well-documented.

Quotes:

  • 'The baseline characteristics (Table 1) were chosen to reflect a typical patient with diabetes enrolled in a randomized controlled trial for diabetes and sourced from the Action in Diabetes and Vascular Disease-PreterAx and DiamicroN Controlled Evaluation (ADVANCE) trial.'
  • 'Patient Characteristics at Baseline Current age: 66 y, Duration of diabetes: 8 y, Current/former smoker: No, HbA1c: 7.5%...'

Q6 - Setting and location

Question description: Is relevant contextual information (such as setting and location) provided that may influence the findings of the study?

Explanation: The manuscript provides detailed contextual information regarding both the setting and location of the study. The research is tied to the Mount Hood Diabetes Challenge, a conference where international diabetes modeling groups gather to assess structural uncertainty in type 2 diabetes models. Key details about the standardized scenarios used in the simulations and the baseline characteristics of the hypothetical patient population are given, which are relevant contextual factors that could influence the study outcomes.

Quotes:

  • 'We leveraged the Mount Hood Diabetes Challenge Network, a biennial conference attended by international diabetes modeling groups, to assess structural uncertainty in simulating QALYs in type 2 diabetes simulation models.'
  • 'The 9th Mount Hood Diabetes Challenge ran over 2 days in October 2018, during which modeling groups gathered to compare and discuss methodologies, data, and developments in diabetes simulation modeling.'
  • 'The baseline characteristics (Table 1) were chosen to reflect a typical patient with diabetes enrolled in a randomized controlled trial for diabetes and sourced from the Action in Diabetes and Vascular Disease-PreterAx and DiamicroN Controlled Evaluation (ADVANCE) trial.'

Q7 - Comparators

Question description: Are the interventions or strategies being compared described, along with the rationale for their selection?

Explanation: The manuscript provides detailed descriptions of the interventions or strategies being compared within the diabetes simulation models, including specific profiles such as reductions in HbA1c, BMI, systolic blood pressure, and LDL cholesterol, among others. The rationale for selecting these interventions is based on their common usage in managing type 2 diabetes and their ability to provide standardized scenarios for assessing variability across different models.

Quotes:

  • "Modeling groups simulated 5 diabetes-related intervention profiles using predefined baseline characteristics and a standard utility value set for diabetes-related complications."
  • "The challenge consisted of multiple simulations to examine the impact of utility values on QALYs...for 5 different intervention profiles common in the management of patients with T2DM: 0.5%-point permanent reduction in HbA1c, 10 mm Hg permanent reduction in systolic blood pressure, 0.5 mmol/L (19.33 mg/dL) permanent reduction in low-density lipoprotein cholesterol, 1-unit permanent reduction in body mass index (BMI; kg/m2), All above interventions combined."

Q8 - Perspective

Question description: What perspective(s) were adopted by the study, and why were they chosen?

Explanation: The manuscript does not specify the particular perspective(s) adopted by the study. While it discusses the methodology and focus on cross-model comparison to assess structural and parameter uncertainty, it does not explicitly mention a perspective such as a societal, healthcare system, or payer perspective, which are common in economic evaluations.

Quotes:

  • Simulation models that are built to analyze the same problem often produce different results, primarily because the models use different data or different model designs even when using the same data.
  • The manuscript discusses the methodology focused on addressing structural uncertainty using diabetes models but does not specify the particular perspective adopted by the study.

Q9 - Time horizon

Question description: What is the time horizon for the study, and why is it appropriate?

Explanation: The manuscript specifies a 40-year time horizon for the study. This is appropriate because economic evaluations of chronic conditions like type 2 diabetes require long time frames to capture the full impact of intervention on patients' quality-adjusted life-years (QALYs) and life-years (LYs), considering the chronic and progressive nature of diabetes and its complications.

Quotes:

  • The challenge consisted of multiple simulations to examine the impact of utility values on QALYs. First, modeling groups were requested to simulate the reference case as specified over a 40-year time horizon, separately for males and females, without an initial impact on biomarkers (the control group) and for 5 different intervention profiles common in the management of patients with T2DM.

Q10 - Discount rate

Question description: What discount rate(s) were used, and what was the rationale for choosing them?

Explanation: The manuscript specifies a 0% discount rate for both life years (LYs) and quality-adjusted life-years (QALYs) in the diabetes simulation models, but it does not discuss the rationale for selecting this rate.

Quotes:

  • A 0% discount rate for both life years (LYs) and QALYs was stipulated.

Q11 - Selection of outcomes

Question description: What outcomes were used as measures of benefit and harm?

Explanation: The manuscript describes that the outcomes used as measures of benefit and harm in the diabetes modeling were life-years (LYs) and quality-adjusted life-years (QALYs).

Quotes:

  • Background: Structural uncertainty can affect model-based economic simulation estimates and study conclusions. Unfortunately, unlike parameter uncertainty, relatively little is known about its magnitude of impact on life-years (LYs) and quality-adjusted life-years (QALYs) in modeling of diabetes.
  • The primary endpoints of the model were life-years gained, quality-adjusted life-years (QALYs), and incremental cost-effectiveness ratios.
  • The results reported at the congress are presented in this article. Results for TTM reported in this article were reported in error because of incorrect input values.

Q12 - Measurement of outcomes

Question description: How were the outcomes used to capture benefits and harms measured?

Explanation: The manuscript does not provide detailed information about specific metrics or tools used to measure the outcomes like LYs or QALYs in terms of capturing benefits and harms. Instead, it focuses on variations across models and utility values but lacks explicit descriptions of how the actual outcomes were measured.

Quotes:

  • The article mentions that 'The results provided by the 11 diabetes modeling groups that participated in the 9th Mount Hood Diabetes Challenge (see below) were pooled and analyzed to address the 3 objectives.' which is more about data compilation rather than measurement.
  • The methods section of the manuscript states, 'First, modeling groups were requested to simulate the reference case as specified over a 40-year time horizon,' which does not detail the measurement of outcomes but rather the methodological setup.

Q13 - Valuation of outcomes

Question description: What population and methods were used to measure and value the outcomes?

Explanation: The population used for measuring and valuing outcomes in the study consisted of a typical patient with type 2 diabetes, using baseline characteristics from the ADVANCE trial, as described in the methods section. The methods for measuring outcomes involved 11 diabetes simulation modeling groups who simulated interventions using these predefined baseline characteristics and a standard utility value set to report life-years (LYs) and quality-adjusted life-years (QALYs).

Quotes:

  • The baseline characteristics (Table 1) were chosen to reflect a typical patient with diabetes enrolled in a randomized controlled trial for diabetes and sourced from the Action in Diabetes and Vascular Disease-PreterAx and DiamicroN Controlled Evaluation (ADVANCE) trial.
  • Eleven type 2 diabetes simulation modeling groups participated in the 9th Mount Hood Diabetes Challenge. Modeling groups simulated 5 diabetes-related intervention profiles using predefined baseline characteristics and a standard utility value set for diabetes-related complications. LYs and QALYs were reported.
  • The modeling groups were asked to populate their models using a standard (and widely used) set of utility values (Table 2) for diabetes-related complications from a published systematic review.

Q14 - Measurement and valuation of resources and costs

Question description: How were the costs valued in the study?

Explanation: The manuscript describes that costs were valued using a standard set of utility and disutility values for diabetes-related complications, sourced from a systematic review. These values were used across the different models in the Mount Hood Diabetes Challenge.

Quotes:

  • The modeling groups were asked to populate their models using a standard (and widely used) set of utility values (Table 2) for diabetes-related complications from a published systematic review and to document health states within their models that have a utility value attached to them.
  • The instructions for the Quality-of-Life Challenge...included a set of standard patient baseline characteristics and a set of utility values for a wide range of likely health states and model features, which all modeling groups were asked to use (reference case).
  • The standard set of utility and disutility values used to populate health states was sourced from Reference 29.

Q15 - Currency, price, date, and conversion

Question description: What are the dates of the estimated resource quantities and unit costs, and what currency and year were used for conversion?

Explanation: The manuscript does not specify the dates of the estimated resource quantities and unit costs, nor does it mention the currency and year used for conversion. It primarily discusses the simulation models and utility values for diabetes-related complications without addressing economic data specifics.

Quotes:

  • The manuscript primarily discusses structural uncertainty and variability in QALY predictions across diabetes simulation models, without mentioning resource cost dates or currency conversion details.
  • '...challenge consisted of multiple simulations to examine the impact of utility values on QALYs...'
  • 'Eleven type 2 diabetes simulation modeling groups participated in the 9th Mount Hood Diabetes Challenge...' focuses on structural uncertainty rather than economic data specifics.

Q16 - Rationale and description of model

Question description: If a model was used, was it described in detail, including the rationale for its use? Is the model publicly available, and where can it be accessed?

Explanation: The manuscript discusses various models used in the Mount Hood Diabetes Challenge but does not provide detailed descriptions or rationales for the specific models used, nor does it mention their public availability or access points.

Quotes:

  • The results of the 9th Mount Hood Diabetes Quality-of-Life Challenge provide a unique opportunity to examine the importance of structural uncertainty using the reported outcomes of 11 different diabetes simulation models (reporting 12 sets of model results).
  • Although the conference featured 3 challenges, this article focuses on the Quality-of-Life Challenge only. Participation in the challenge required submission in advance of results and model and methods documentation.
  • Instruction on the modeling challenges were posted in advance on the Mount Hood Diabetes Challenge website (https://www.mthooddiabeteschallenge.com/), and all registered modeling groups were invited to participate.

Q17 - Analytics and assumptions

Question description: What methods were used for analyzing or statistically transforming data, extrapolation, and validating any models used?

Explanation: The manuscript does not provide specific details on statistical methods or extrapolation techniques used in the data analysis and validation of the models. It primarily discusses the structural uncertainty and variability across different simulation models.

Quotes:

  • The results provided by the 11 diabetes modeling groups that participated in the 9th Mount Hood Diabetes Challenge were pooled and analyzed to address the 3 objectives. All modeling groups approved the use of their results and contributed to this article.
  • Estimating structural uncertainty: Submitted results were collated, and the variability across different models was assessed by calculating the mean and standard deviations of reported outputs (LYs, QALYs, incremental LYs, and incremental QALYs).

Q18 - Characterizing heterogeneity

Question description: What methods were used to estimate how the results vary for different sub-groups?

Explanation: The manuscript does not specify methods for estimating how the results vary for different sub-groups. It is focused on comparing variability across different models and examining consistency through a standardized diabetes challenge rather than addressing subgroup analysis directly.

Quotes:

  • The aim of this article is to leverage these cross-model estimates for a standardized set of simulation scenarios to 1) assess the magnitude of structural uncertainty by comparing outputs of a large number of diabetes models.
  • This challenge provided valuable insights into variation in outcomes produced by different diabetes models and for different intervention profiles, despite controlling for baseline patient characteristics and, to a certain extent, simulation assumptions.

Q19 - Characterizing distributional effects

Question description: How were the impacts distributed across different individuals, and were adjustments made to reflect priority populations?

Explanation: The manuscript does not specifically address how impacts are distributed across different individuals or mention any adjustments made to reflect priority populations. Instead, the focus is on comparing modeling methods and assessing structural uncertainty in diabetes models.

Quotes:

  • 'The findings indicate substantial cross-model variability in QALY predictions for a standardized set of simulation scenarios and is considerably larger than within model variability to alternative health state utility values.'
  • 'Simulation models that are built to analyze the same problem often produce different results, primarily because the models use different data or different model designs even when using the same data.'

Q20 - Characterizing uncertainty

Question description: What methods were used to characterize sources of uncertainty in the analysis?

Explanation: The manuscript describes multiple methods for characterizing structural uncertainty. These include scenario analysis and model averaging, comparing outputs from different models run on the same problem, and altering structural assumptions within models. Efficiently, the 9th Mount Hood Diabetes Challenge applied these methods by using a standardized set of scenarios run by multiple modeling groups to compare outcomes and investigate drivers of cross-model differences.

Quotes:

  • One way to pragmatically perform this multiway evaluation of structural uncertainty (while simultaneously minimizing risks that the individual results will still be correlated) is to bring many independent simulation models to bear on the same decision problem (i.e., with simulation of the same standardized scenario).
  • There are various ways to evaluate structural uncertainty, including examining the response of model results to changes in a structural assumption (e.g., altering the parametric form of an important risk equation, use of static or dynamic transition rates, disease states to include), presenting alternative results from scenario analyses or through model averaging where multiple structural changes are considered simultaneously.
  • As part of the 2018 Mount Hood Diabetes Challenge, 11 diabetes models simulated a set of standardized scenarios designed to inform our knowledge of how model estimates respond to different health state utility value assumptions and how model estimates vary across models with different structures.

Q21 - Approach to engagement with patients and others affected by the study

Question description: Were patients, service recipients, the general public, communities, or stakeholders engaged in the design of the study? If so, how?

Explanation: The manuscript does not indicate that patients, service recipients, or other public stakeholders were involved in the design of the study. The focus was on modeling groups and their participation in a challenge to assess structural uncertainty using pre-defined scenarios and characteristics.

Quotes:

  • Eleven type 2 diabetes simulation modeling groups participated in the 9th Mount Hood Diabetes Challenge.
  • Participation in the challenge required submission in advance of results and model and methods documentation.
  • The challenge consisted of multiple simulations to examine the impact of utility values on QALYs.

Q22 - Study parameters

Question description: Were all analytic inputs or study parameters (e.g., values, ranges, references) reported, including uncertainty or distributional assumptions?

Explanation: The manuscript provides a clear account of the analytic inputs and study parameters, including specific values and ranges for utility and disutility values and detailed descriptions of methods used to assess uncertainty, such as standard utility value sets and multiple simulations using different confidence interval limits.

Quotes:

  • The modeling groups were asked to populate their models using a standard (and widely used) set of utility values (Table 2) for diabetes-related complications from a published systematic review and to document health states within their models that have a utility value attached to them.
  • The simulation was repeated using all the lower limit of the 95% confidence interval of the standardized set of utility values (Table 2) and then with the upper limit of the 95% confidence interval.
  • Changing utility values to the lower or upper limits of the 95% confidence intervals resulted in a decrease and increase in QALYs, respectively.

Q23 - Summary of main results

Question description: Were the mean values for the main categories of costs and outcomes reported, and were they summarized in the most appropriate overall measure?

Explanation: The manuscript does not report mean values for the main categories of costs and outcomes in a summarized or overall measure format. Although it details outcomes like LYs and QALYs, it primarily focuses on structural uncertainty and variability across different diabetes models without providing mean values for costs or a summarized overall measure of outcomes.

Quotes:

  • Mean estimated LYs and QALYs were 17.69 years (SD, 2.82) and 12.26 (SD, 1.51), respectively.
  • LYs ranged from 11.7 to 19.6 years for males and 14.1 to 23.8 years for females, with a difference of 7.9 and 9.8 years between the lowest and highest reported values, respectively.
  • QALYs ranged from 8.7 to 12.6 for males and 10.4 to 15.0 for females, with a difference of 4.0 and 4.6 QALYs, respectively.

Q24 - Effect of uncertainty

Question description: How did uncertainty about analytic judgments, inputs, or projections affect the findings? Was the effect of the choice of discount rate and time horizon reported, if applicable?

Explanation: The article does not report on the effect of specific choices such as the discount rate or time horizon on the findings. While it mentions structural uncertainties affecting model-based outcomes, it focuses primarily on cross-model variations and utility value impacts rather than on specific analytic judgments like discount rates or time horizon.

Quotes:

  • "A 0% discount rate for both life years (LYs) and QALYs was stipulated."
  • "The findings indicate substantial cross-model variability in QALY predictions for a standardized set of simulation scenarios and is considerably larger than within model variability to alternative health state utility values."

Q25 - Effect of engagement with patients and others affected by the study

Question description: Did patient, service recipient, general public, community, or stakeholder involvement make a difference to the approach or findings of the study?

Explanation: The manuscript does not mention any involvement from patients, service recipients, the general public, the community, or stakeholders in the study approach or findings. It focuses primarily on methodological aspects regarding diabetes simulation models and does not discuss external involvement.

Quotes:

  • 'The modeling groups were asked to populate their models using a standard (and widely used) set of utility values... All modeling groups were asked to apply utility decrement values additively.'
  • 'The results provided by the 11 diabetes modeling groups that participated in the 9th Mount Hood Diabetes Challenge (see below) were pooled and analyzed to address the 3 objectives. All modeling groups approved the use of their results and contributed to this article.'

Q26 - Study findings, limitations, generalizability, and current knowledge

Question description: Were the key findings, limitations, ethical or equity considerations, and their potential impact on patients, policy, or practice reported?

Explanation: The manuscript focuses predominantly on the examination of structural uncertainty and variability across simulation models in estimating quality-adjusted life-years (QALYs) and does not explicitly discuss ethical or equity considerations, or their potential impact on patients, policy, or practice.

Quotes:

  • These findings reinforce the importance of assessing structural uncertainty thoroughly because the choice of model (or models) can influence study results, which can serve as evidence for resource allocation decisions.
  • The findings indicate substantial cross-model variability in QALY predictions for a standardized set of simulation scenarios, despite the long familiarity between modeling groups.
  • We acknowledge the clear limitation of the current analysis, in particular, that it provides only an initial exploration as to why results vary across models.

SECTION: TITLE
Exploring Structural Uncertainty and Impact of Health State Utility Values on Lifetime Outcomes in Diabetes Economic Simulation Models: Findings from the Ninth Mount Hood Diabetes Quality-of-Life Challenge


SECTION: ABSTRACT
Background

Structural uncertainty can affect model-based economic simulation estimates and study conclusions. Unfortunately, unlike parameter uncertainty, relatively little is known about its magnitude of impact on life-years (LYs) and quality-adjusted life-years (QALYs) in modeling of diabetes.
ttle is known about its magnitude of impact on life-years (LYs) and quality-adjusted life-years (QALYs) in modeling of diabetes. We leveraged the Mount Hood Diabetes Challenge Network, a biennial conference attended by international diabetes modeling groups, to assess structural uncertainty in simulating QALYs in type 2 diabetes simulation models.ups, to assess structural uncertainty in simulating QALYs in type 2 diabetes simulation models.

Methods

Eleven type 2 diabetes simulation modeling groups participated in the 9th Mount Hood Diabetes Challenge. Modeling groups simulated 5 diabetes-related intervention profiles
ethods

Eleven type 2 diabetes simulation modeling groups participated in the 9th Mount Hood Diabetes Challenge. Modeling groups simulated 5 diabetes-related intervention profiles using predefined baseline characteristics and a standard utility value set for diabetes-related complications. LYs and QALYs were reported.
Eleven type 2 diabetes simulation modeling groups participated in the 9th Mount Hood Diabetes Challenge. Modeling groups simulated 5 diabetes-related intervention profiles using predefined baseline characteristics and a standard utility value set for diabetes-related complications. LYs and QALYs were reported. Simulations were repeated using lower and upper limits of the 95% confidence intervals of utility inputs. Changes in LYs and QALYs from tested interventions were compared across models. Additional analyses were conducted postchallenge to investigate drivers of cross-model differences.

Results

Substantial cross-model variability in incremental LYs and QALYs was observed
, particularly for HbA1c and body mass index (BMI) intervention profiles. For a 0.5%-point permanent HbA1c reduction, LY gains ranged from 0.050 to 0.750. For a 1-unit permanent BMI reduction, incremental QALYs varied from a small decrease in QALYs (-0.024) to an increase of 0.203. Changes in utility values of health states had a much smaller impact (to the hundredth of a decimal place) on incremental QALYs. Microsimulation models were found to generate a mean of 3.41 more LYs than cohort simulation models (P = 0.049).

Conclusions

Variations in utility values contribute to a lesser extent than uncertainty captured as structural uncertainty. These findings reinforce the importance of assessing structural uncertainty thoroughly because the choice of model (or models) can influence study results, which can serve as evidence for resource allocation decisions.


Highlights

The findings indicate substantial cross-model variability in QALY predictions for a standardized set of simulation scenarios and is considerably larger than within model variability to alternative health state utility values
(e.g., lower and upper limits of the 95% confidence intervals of utility inputs).

There is a need to understand and assess structural uncertainty, as the choice of model to inform resource allocation decisions can matter more than the choice of health state utility values.

SECTION: INTRO
Introduction

Simulation models that are built to analyze the same problem often produce different results, primarily because the models use different data or different model designs even when using the same data.
Economic modelers make different choices when designing their model structures and selecting risk equations and other parameter values. These different choices lead to uncertainty in model results, known in the modeling community as structural or model uncertainty. Sources that can lead to differences between models could be many, including 1) the type of model (e.g., Markov, statistical, discrete event simulation, decision tree), 2) choices for implicit and explicit data assumptions with a specific model, and 3) technical/methodological differences in implementing the given model (e.g., inclusion or exclusion of potentially relevant events, statistical models used to estimate specific parameters, in which different shape properties can affect extrapolation into the future).

Substantial effort has been put into understanding and capturing 3 of the 4 leadings forms of uncertainty in health economic modeling (i.e., parameter, heterogeneity, and methodological uncertainty), which are commonly addressed using probabilistic sensitivity analysis, reference cases, and prescribed guidelines. Addressing structural uncertainty is relatively uncommon, despite numerous recommendations and recognition that its potential impact on results may be greater than other types of uncertainty. This may be because, in part, it can be more difficult to assess than other forms of uncertainties, and there is relatively little guidance for addressing structural uncertainty formally.

Type 2 diabetes mellitus (T2DM) is a chronic and progressive disease
characterized by hyperglycemia. Chronic hyperglycemia is associated with a number of debilitating and life-threatening long-term macro- and microvascular complications. Many of these complications share common risk factors, and the presence of one can also increase risks for developing the others. Given its complex and interdependent pathophysiology, modeling T2DM is particularly challenging. For this reason, diabetes simulation models tend to be complex and sometimes opaque. These models play an important role in allowing health care policy makers to make informed decisions when selecting between alternative health interventions. Given the important role of these models in resource allocation considerations, it is important that those responsible for model development understand how structural uncertainty affects the results they produce.

There are various ways to evaluate structural uncertainty, including examining the response of model results to changes in a structural assumption (e.g., altering the parametric form of an important risk equation, use of static or dynamic transition rates, disease states to include), presenting alternative results from scenario analyses or through model averaging where multiple structural changes are considered simultaneously. Although uncommon in the literature, these approaches provide an indication of the impact of alternative choices made during the model development process and structural uncertainties arising from the model(s) considered by the same analyst(s). An alternative approach to capturing structural uncertainty would be to compare the different ways groups of analysts may differ in their approach to the same problem. Such a comparison would have a natural advantage in assessing the robustness of results of an individual study problem (e.g., confidence should be high when a treatment is cost-effective under all reasonable combinations of structural assumptions).

One way to pragmatically perform this multiway evaluation of structural uncertainty (while simultaneously minimizing risks that the individual results will still be correlated) is to bring many independent simulation models to bear on the same decision problem (i.e., with simulation of the same standardized scenario). For modeling diabetes treatments, use of a dedicated network such as the Mount Hood Diabetes Challenge (www.mthooddiabeteschallenge.com) is both an effective and efficient option. The Mount Hood Diabetes Challenge has regularly held conferences in which up to 10 or more diabetes modeling groups have met biennially since 2000 to cross-validate the models by running standardized simulation scenarios. A key aspect of diabetes simulation models is to capture the impact of the progression of diabetes and its complication on quality-adjusted life-years (QALYs). Economic evaluations of antidiabetes treatments often show relatively small incremental differences in QALYs between treatment arms. For example, a recent systematic review of 124 model evaluations of blood glucose-lowering interventions reported an average incremental difference of 0.409 QALYs. Structural uncertainty is likely to be important, as small absolute variations in QALYs can have outsized impacts on incremental cost-effectiveness ratios.

As part of the 2018 Mount Hood Diabetes Challenge, 11 diabetes models simulated a set of standardized scenarios designed to inform our knowledge of how model estimates respond to different health state utility value assumptions and how model estimates vary across models with different structures
, something that cannot generally be examined without such a large and diverse group. The aim of this article is to leverage these cross-model estimates for a standardized set of simulation scenarios to 1) assess the magnitude of structural uncertainty by comparing outputs of a large number of diabetes models, 2) compare outputs related to parameter uncertainty by varying health state utility values to quantify the degree of uncertainty generated, and 3) investigate the drivers of cross-model differences.

SECTION: METHODS
Methods

The results provided by the 11 diabetes modeling groups that participated in the 9th Mount Hood Diabetes Challenge (see below) were pooled and analyzed to address the 3 objectives. All modeling groups approved the use of their results and contributed to this article.
The results provided by the 11 diabetes modeling groups that participated in the 9th Mount Hood Diabetes Challenge (see below) were pooled and analyzed to address the 3 objectives. All modeling groups approved the use of their results and contributed to this article.

9th Mount Hood Diabetes Challenge

The 9th Mount Hood Diabetes Challenge ran over 2 days in October 2018, during which modeling groups gathered to compare and discuss methodologies, data, and developments in diabetes simulation modeling.
Instructions on the modeling challenges were posted in advance on the Mount Hood Diabetes Challenge website (https://www.mthooddiabeteschallenge.com/), and all registered modeling groups were invited to participate. Although the conference featured 3 challenges, this article focuses on the Quality-of-Life Challenge only. Participation in the challenge required submission in advance of results and model and methods documentation.Participation in the challenge required submission in advance of results and model and methods documentation. Results were discussed among participating modeling groups at an allocated congress session.

Quality-of-Life Challenge

The instructions for the Quality-of-Life Challenge can be found in Supplementary Material 1. Briefly, the challenge instructions included a set of standard patient baseline characteristics and a set of utility values for a wide range of likely health states and model features, which all modeling groups were asked to use (reference case). The baseline characteristics (Table 1) were chosen to reflect a typical patient with diabetes enrolled in a randomized controlled trial for diabetes and sourced from the Action in Diabetes and Vascular Disease-PreterAx and DiamicroN Controlled Evaluation (ADVANCE) trial.The baseline characteristics (Table 1) were chosen to reflect a typical patient with diabetes enrolled in a randomized controlled trial for diabetes and sourced from the Action in Diabetes and Vascular Disease-PreterAx and DiamicroN Controlled Evaluation (ADVANCE) trial. In the event that a model required input values not included in the instructions, the groups were asked to source their assumptions from published literature and to submit documentation with the results.

SECTION: TABLE
Characteristics of a Representative Patient (Applied to Both Males and
Females) Used in Simulations Sourced From Ref. 28

Patient Characteristics at Baseline Current age 66 y Duration of diabetes 8 y Current/former smoker No HbA1c 7.5%
Systolic blood pressure 145 mm Hg Diastolic blood pressure 80 mm Hg Total cholesterol 5.2 mmol/L High-density lipoprotein cholesterol 1.3 mmol/L Low-density lipoprotein cholesterol 3.0 mmol/L Body mass index 28 kg/m2 Albumin:creatinine ratio 14.2 Peripheral vascular disease No Micro or macro albuminuria (albuminuria =50) No Atrial fibrillation No Estimated glomerular filtration rate 70 mL/min/1.73 m2 White blood cell count 7 x 109/L Heart rate 79 bpm Hemoglobin 14 g/dL History of macrovascular disease No History of microvascular disease No

SECTION: METHODS
The modeling groups were asked to populate their models using a standard (and widely used) set of utility values (Table 2) for diabetes-related complications from a published systematic review
The modeling groups were asked to populate their models using a standard (and widely used) set of utility values (Table 2) for diabetes-related complications from a published systematic review and to document health states within their models that have a utility value attached to them. All modeling groups were asked to apply utility decrement values additively (where feasible). Modeling groups were asked to source utility values for health states not included in the challenge instructions from published literature and to add to the documentation.

SECTION: TABLE
Standard Set of Utility and Disutility Values Used to Populate
Health-States Sourced from Ref. 29


Disease Category Complication Level Provided in Mt. Hood QoL Challenge Utility/Disutility Values Control Lower 95% CI Upper 95% CI Baseline utility value Type 2 diabetes mellitus without complications 0.785 0.681 0.889 Acute metabolic disorder Minor hypoglycemia event -0.014 -0.004* -0.004* Major hypoglycemia event -0.047 -0.012* -0.012* Comorbidity Excess body mass index (each unit 25 kg/m2) -0.006 -0.008 -0.004 Retinopathy Cataract -0.016 -0.031 -0.001 Moderate nonproliferative background diabetic retinopathy -0.040 -0.066 -0.014 Moderate macular edema -0.040 -0.066 -0.014 Vision-threatening diabetic retinopathy -0.070 -0.099 -0.041 Severe vision loss -0.074 -0.124 -0.025 Nephropathy Proteinuria -0.048 -0.091 -0.005 Renal transplant -0.082 -0.137 -0.027 Hemodialysis -0.164 -0.274 -0.054 Peritoneal dialysis -0.204 -0.342 -0.066 Neuropathy Peripheral vascular disease -0.061 -0.090 -0.032 Neuropathy -0.084 -0.111 -0.057 Active ulcer -0.170 -0.207 -0.133 Amputation event -0.280 -0.389 -0.170 Cerebrovascular disease Stroke -0.164 -0.222 -0.105 Coronary heart disease Myocardial infarction -0.055 -0.067 -0.042 Ischemic heart disease -0.090 -0.126 -0.054 Heart failure -0.108 -0.169 -0.048

Disutilities converted to annual values

SECTION: METHODS
The challenge consisted of multiple simulations to examine the impact of utility values on QALYs. First, modeling groups were requested to simulate the reference case as specified over a 40-year time horizon, separately for males and females, without an initial impact on biomarkers (the control group) and for 5 different intervention profiles common in the management of patients with T2DM
The challenge consisted of multiple simulations to examine the impact of utility values on QALYs. First, modeling groups were requested to simulate the reference case as specified over a 40-year time horizon, separately for males and females, without an initial impact on biomarkers (the control groupn, separately for males and females, without an initial impact on biomarkers (the control group) and for 5 different intervention profiles common in the management of patients with T2DM:

0.5%-point permanent reduction in HbA1c

10 mm Hg permanent reduction in systolic blood pressure

0.5 mmol/L (19.33 mg/dL) permanent reduction in low-density lipoprotein cholesterol

1-unit permanent reduction in body mass index (BMI; kg/m2)

All above interventions combined


Modeling groups were requested to standardize model assumptions around biomarker evolution; for instance, HbA1c and systolic blood pressure to be kept constant over time and not allow for evolution (increase or decrease over time). A 0% discount rate for both life years (LYs) and QALYs was stipulated.

The simulation was repeated using all the lower limit of the 95% confidence interval of the standardized set of utility values (Table 2) and then with the upper limit of the 95% confidence interval. To further examine the impact of varying individual health state utility values on incremental QALYs, modeling groups were asked to vary the utility value for each health state one at a time with the lower and upper 95% confidence intervals and report incremental QALYs (all others assuming the mean value) for the control group and for the 0.5%-point reduction in HbA1c profile.

The modeling groups were requested to submit detailed results for each treatment profile for each simulation, including estimated LYs and QALYs, and cumulative event rates for each health state, in advance of the congress. Modeling groups that submitted their challenge results prior to the congress and participated in the event were included in this article. All modeling groups agreed to include their simulation results in a peer-reviewed publication prior to the meeting. Resimulation was not allowed; however, modeling groups were given the opportunity to check their submitted results post-challenge, and where applicable, updated results can be added to the appendix.

Post-challenge Statistical Analysis

Estimating structural uncertainty

Submitted results were collated, and the variability across different models was assessed by calculating the mean and standard deviations of reported outputs (LYs, QALYs, incremental LYs, and incremental QALYs).
Incremental outcomes in LYs and QALYs across different models were ordered from lowest to highest, ranked, and plotted to facilitate comparisons between models and by intervention profiles. The Spearman's rank-order correlation test was used to assess the strength and direction of association between the rankings of incremental LYs and QALYs across all models.

Impact and contribution of health state utility values

Results (incremental LYs and QALYs) from repeated simulations using the lower and upper limit of the 95% confidence intervals of utility values were also collated. These were compared with the reference case simulation results to provide an illustration of the relative magnitude of structural uncertainty in comparison with parameter uncertainty. Results were presented in figures to facilitate visualizing the impact of utility values within and across different models.

Investigate potential drivers for variations in reported outcomes

Each model application was characterized for a set of key characteristics of the model and how the modelers simulated the reference simulations, which included microsimulation methodology, number of health states with utility implications, the use of the UK Prospective Diabetes Study (UKPDS) cardiovascular and mortality risk equations, the use of additive utilities, and the inclusion of BMI disutility. The individual groups were consulted to ensure the models were correctly classified. LYs and incremental QALYs results were then plotted to facilitate comparison, and differences in mean life expectancies and incremental QALYs were compared across each of the subgroups. Regression analyses were conducted to test for associations between model characteristics and modeling approaches and model outcomes, using a 2-step approach. First, a 2-way fixed effect regression analysis was conducted to identify which models consistently produce higher or lower estimates across the intervention profiles simulated. Predicted average model effects across all 5 intervention profiles were then regressed against characteristics and modeling approaches to identify possible associations with outcomes.

SECTION: RESULTS
Results

Eleven modeling groups participated in the Quality-of-Life Challenge (Table 3). The Cardiff Model submitted 2 sets of results, one using UKPDS 68 risk equations and the other using UKPDS 82 risk equations, yielding 12 sets of model results. Brief descriptions of participating groups can be found in Supplementary Material 2. Model-specific documentation of health states with utility values and a description of the utility approach used for handling multiple complications can be found in Supplementary Material 3.

SECTION: TABLE
Participating Modeling Groups

BRAVO Diabetes model Cardiff model (UKPDS 82 and UKPDS 68)a Centers for Disease Control and Prevention and Research Triangle Institute (CDC/RTI) type 2 diabetes cost-effectiveness model Economics and Health Outcomes Model of T2DM (ECHO-T2DM) IQVIA Core Diabetes Model (IQVIA CDM) Modeling Integrated Care for Diabetes based on Observational data (MICADO) model Michigan Model for Diabetes (MMD) PROSIT Disease Modelling Community SPHR Type 2 Diabetes Treatment model (SPHR Type 2) Treatment Transition Model (TTM) UKPDS Outcomes model version 2 (UKPDS-OM)

Cardiff modeling group used 2 different sets of risk equations, and
results from both were submitted.

SECTION: RESULTS
The number of health states with assigned utilities in the different models ranged from 10 to 38. Most models employed the additive approach to incorporate (dis-)utility values for comorbidities, but this was not possible for all models. IQVIA-CDM used the minimum approach per health state but added disutility for BMI, hypoglycemia events, and new events such as myocardial infarction and stroke; the Treatment Transition Model (TTM) used the minimum approach; and SPHR applied a multiplicative effect. Model characteristics and modeling approaches applied during the challenge are presented in Table 4.

SECTION: TABLE
Model Characteristics and Modeling Approaches Applied during the
Challenge

Model Microsimulation Model Number of Health States with Utilities Uses UKPDS Mortality Risk Equation Uses UKPDS Cardiovascular Risk Equation Includes Health State Related to BMI Inclusion of BMI Disutility Weight Applied Additive Utilities Changed Baseline Utility in Parallel with Complication Utilities BRAVO Yes 29 No No Yes Yes Yes Yes Cardiff UKPDS68 Yes 12 Yes Yes Yes Yes Yes No Cardiff UKPDS82 Yes 12 Yes Yes Yes Yes Yes No CDC/RTI No 10 No Yes Yes Yes Yes Yes ECHO-T2DM Yes 38 Yes Yes Yes Yes Yes No IQVIA CDM Yes 32 Yes Yes No Yes No No MICADO No 17 No No Yes No Yes Yes MMD Yes 19 Yes Yes No Yes Yes Yes Prosit No 29 No Yes No No Yes No SPHR Type 2 Yes 13 Yes Yes Yes Yes No Yes TTM Yes 13 Yes Yes Yes Yes No Yes UKPDS-OM Yes 12 Yes Yes No No Yes Yes

SECTION: RESULTS
The results reported at the congress are presented in this article. Results for TTM reported in this article were reported in error because of incorrect input values. The spirit of the Mount Hood Challenges is to explore all modeling groups' results as they were originally presented to maintain the fidelity of discussions and conclusions that occurred at the conference. Corrected TTM results are therefore presented in the supplementary materials.

Cross-Model Variations in Reported Outcomes

Reported outcomes (LYs and QALYs) for the reference case simulation (control group) were compared across models (Figure 1). Mean estimated LYs and QALYs were 17.69 years (SD, 2.82) and 12.26 (SD, 1.51), respectively. LYs ranged from 11.7 to 19.6 years for males and 14.1 to 23.8 years for females, with a difference of 7.9 and 9.8 years between the lowest and highest reported values, respectively. QALYs ranged from 8.7 to 12.6 for males and 10.4 to 15.0 for females, with a difference of 4.0 and 4.6 QALYs, respectively.

SECTION: FIG
Comparison of life-years (LYs) and quality-adjusted life-years (QALYs)
across all modeling groups (control). *The results for the Treatment
Transition Model include simulations with incorrect input values,
resulting in volatile interactions between interventions and changes in
utilities. Corrected values (postchallenge) are reported in the
supplementary materials.

SECTION: RESULTS
Incremental LYs and QALYs for each model and intervention (males and females combined) are presented are Figure 2 (full results can be found in Supplementary Materials 4 and 5), showing substantial variability in outcomes. This was particularly apparent for the HbA1c and BMI intervention profiles, where there was a 15-fold difference between the lowest and highest reported incremental LYs for the HbA1c intervention profile and at least a 10-fold difference for incremental QALYs for the BMI intervention. The Spearman's rank-order correlation test indicated a non-statistically significant association between the rankings of reported LYs and QALYs for both of these intervention profiles. When the Prosit, MMD, TTM, BRAVO, and MICADO models were excluded, less variation in incremental outcomes was observed.

SECTION: FIG
Comparisons of incremental life-years (DeltaLYs) and incremental QALYs
(DeltaQALYs) across different models by intervention profile. *The results
for Treatment Transition Model (TTM) include simulations with incorrect
input values, resulting in volatile interactions between interventions
and changes in utilities. Corrected values (postchallenge) are reported
in the following Supplementary Materials.

SECTION: RESULTS
Impact of Health State Utility Values on Lifetime Outcomes

Changing utility values to the lower or upper limits of the 95% confidence intervals resulted in a decrease and increase in QALYs, respectively. Within each model, reported QALYs were similar across interventions and by sex. However, comparisons across models indicate considerable cross-model variability. The Cardiff models (both UKPDS 68 and 82) reported the smallest change (+-0.16 QALYs, 1.5% change), and a change of up to +-3.52 QALYs (31% change) was reported by the BRAVO modeling group. Eight of the 12 models showed a greater than 15% change in reported QALYs when changing utility values to the lower and upper limits (results presented in Supplementary Material 6).

Figure 3 shows the effect of utility changes (error bars representing the lower and upper limits of the 95% confidence interval) on incremental QALYs for the "All interventions combined" profile. Although varying utility values had an impact on incremental QALYs within each of the models, the observed variation across models was much more prominent. This was similarly observed across the other intervention profiles (full results and figures presented in Supplementary Materials 7 and 8).

SECTION: FIG
Impact of utility values on incremental quality-adjusted life-years
(QALYs) within and across the different models for the "All
interventions" combined profile. The error bars indicate the impact of
change in all utility values (to the lower and upper limits of the 95%
confidence interval). *The Treatment Transition Model (TTM) reported a
large change in incremental QALYs for the upper limit due to input
error; therefore the upper limit error bars were omitted for TTM. The
results for TTM include simulations with incorrect input values
resulting in volatile interactions between interventions and changes in
utilities. Corrected values (postchallenge) are reported in the
following Supplementary Materials. ^No error bars were shown for
^Prosit, as these results were unavailable.

SECTION: RESULTS
In comparison with the observed cross-model variability, the effect of changing the utility value associated with each health state was of a much smaller magnitude. These changes resulted in very small changes to the incremental QALYs (to the hundredth of a decimal place) and are presented in Supplementary Material 9. However, this effect was highly variable across models. For example, changing the utilities for stroke to the lower 95% CI limit resulted in a 10.5% change in incremental QALYs reported by BRAVO, while CDC/RTI and IQVIA reported a negligible change. It was also observed that the relative change in incremental QALYs due to utility change of certain health states such as ischemic heart disease and myocardial infarction are generally consistent across models, but for rarer outcomes such as blindness and amputation, a greater variation was observed.

It was found postchallenge that some modeling groups (e.g., BRAVO, SPHR, CDC/RTI, MICADO, MMD) varied their baseline utility value (without complications) in parallel with varying utility values associated with complications, whereas others kept this constant using the base value. The potential for systematic differences in reported outcomes by modeling groups' approach was tested, and we found no difference (Supplementary Materials 10).

Impact of Model Characteristics and Modeling Approaches on Reported Outcomes

Models were subgrouped based on model characteristics and modeling approaches applied during the challenge (Table 4). Of the 12 participating models (including the 2 versions of the Cardiff Model), 9 were microsimulation models, 4 had more than 20 health states with utility implications, 8 models used the UKPDS mortality risk equation, 10 incorporated UKPDS cardiovascular risk equations, and 9 were able to apply additive disutility weights as per the challenge instructions. For the LYs outcome, microsimulation models appear to report more LYs than nonmicrosimulation (cohort) models (by at least 3.30 years) across all 5 intervention profiles. Greater LYs were also reported in models with 20 health states with utilities and among models that incorporated the UKPDS mortality risk equation. However, these differences were small (ranging from 0.04 to 1.92), and none were statistically significant. For the incremental QALYs outcome, there were no obvious patterns as to how outcomes differed by model characteristics and modeling approaches across the intervention profiles. Full results and figures are presented in Supplementary Materials 11.

The regression analyses identified BRAVO and MMD models as consistently producing larger estimates across the intervention profiles, whereas CDC/RTI, MICADO, IQVIA CDM, and TTM produced smaller estimates (Supplementary Material 12). Similar to the subgroup analysis, microsimulation models appear to report more LYs than cohort simulation models (by 3.41 years, P = 0.049). No significant associations were observed with any other characteristics or modeling approaches for both outcomes.

SECTION: DISCUSS
Discussion

The results of the 9th Mount Hood Diabetes Quality-of-Life Challenge provide a unique opportunity to examine the importance of structural uncertainty using the reported outcomes of 11 different diabetes simulation models (reporting 12 sets of model results). This challenge provided valuable insights into variation in outcomes produced by different diabetes models and for different intervention profiles, despite controlling for baseline patient characteristics and, to a certain extent, simulation assumptions. The findings indicate substantial cross-model variability in QALY predictions for a standardized set of simulation scenarios, despite the long familiarity between modeling groups (some relationships going back 20 years) and the development of guidelines to enhance model comparability. Interestingly, the observed cross-model variations were considerably larger than within-model variability to alternative health state utility values (e.g., lower and upper limits of the 95% confidence intervals of utility inputs). Cross-model differences may conceivably be even larger in other disease areas, which have not developed this type of shared modeling community. The potential importance of underlying model assumptions, structure, and data sources may consequently affect important decisions regarding funding/reimbursements and research priorities. This reinforces the need to look critically beyond just parameter uncertainty and to integrate tests of structural uncertainty in model-based analysis.

Although uncertainties due to utility values are routinely assessed through sensitivity analyses, it is much more difficult to ascertain the impact of using different models to inform such decisions. The findings from this challenge indicate that variations in utility values of diabetes-related complications had a smaller impact on incremental outcomes than cross-model variability. For example, the incremental QALYs associated with a 0.5% reduction in HbA1c ranged from 0.066 for the TTM model to 0.331 for the Prosit model, which represents a 5-fold change in outcomes. To put this variation into context, the change is of a larger magnitude compared with the probabilistic uncertainty reported in the evaluation of the blood glucose-lowering intervention in the UKPDS study. Ideally, all sources of uncertainty (not just parameter uncertainty) should be considered.

Despite attempts to identify specific factors that drive the differences observed across models, it was difficult to identify a particular contributing factor (a downside to our pragmatic use of multiway structural uncertainty analysis). Our results indicated that differences across models overshadowed differences between subgroups of models organized by key structural assumptions (Appendix 11 and 12 in the Supplementary Materials), although there were some regularities. For example, we found that microsimulation models generated more mean LYs than cohort simulation models did, and this difference was statistically significant (P = 0.049), despite the small sample size. This is consistent with the convexity of most mortality risk equations (i.e., with risks that increase and at increasing rate). But much of the cross-model differences are likely attributable to combinations of differences in many structural assumptions in the 11 unique diabetes models (12 sets of results). It may also be the correlation between model characteristics and modeling approaches drives the differences observed.

We acknowledge the clear limitation of the current analysis, in particular, that it provides only an initial exploration as to why results vary across models. However, it does illustrate the difficulties of teasing out specific factors as key drivers. An alternative approach to testing structural uncertainty (as mentioned in the introduction) is to assess the impact of changing aspects of a model design (and to document this as per current practice with 1-way sensitivity analyses on key parameters). However, such practices are uncommon, as results (e.g., from the omission of a particular health state) may not be meaningful to consider for decision making, and there is currently a lack of guidance for addressing structural uncertainty formally. This lack of clarity further highlights the need for greater model transparency and a better understanding of the structural elements of a model. These are important considerations and should be an area of focus for future research. These can also inform the design of future Mount Hood Challenges, including specification of more detailed model reporting and outcomes collection and perhaps even greater model transparency to support deeper analyses of the observed variations across models, for example, the extent and number of diabetes-related complications evaluated and the approach of integrating these complications and changes to cumulative complication events across models. There were also differences in how models incorporated the impact of possible interventions; for instance, not all models use BMI as an independent determinant of disease progression, which may explain the large variations in outcomes observed for the BMI intervention profile. A model registry is a way of routinely capturing additional information that would enable future investigation of underlying factors that produce differences in outcomes across models.

A potentially concerning aspect of structural uncertainty is that models used in health technology assessments are often judged in relation to an incremental cost per QALYs gained threshold. Given the wide variation observed, there is scope to achieve a desired outcome by choosing a particular model structure. One way to ensure greater model consistency is to institute model registries, which require a model to run a standard set of reference simulations. Leveraging the cooperative effort and participation of the Mount Hood Diabetes Challenge Network, the group has already taken a step down this road by initiating a diabetes model registry and running simulation challenges to promote transparency in diabetes simulation modeling. Challenge results from registered models such as those presented in this article are made available in an effort toward improving consistency in simulation modeling. In a similar fashion to randomized controlled trials, requiring all models to register and report results for simulated reference case outcomes would be one way to increase model transparency but could also provide an opportunity to quantify the level of structural uncertainty (as presented in this article). It may also be possible to capture uncertainty by parameterizing variation observed within the registry for interventions that have an impact on particular risk factors (e.g., interventions that affect body weight could draw on the variation in uncertainty from the simulations for the change in BMI; see Figure 2).

Although there have been suggestions to address the issue of structural uncertainty (for example, through model averaging, parameterization, model discrepancy, or scenario analyses), these approaches are not commonly applied to health economic decision modeling, and there is little guidance on how structural uncertainty can be reduced. One potential way would be to place more weight on results of models that have been shown through external validation to be reliable in reproducing observed outcomes. While the Mount Hood Diabetes Challenge Network has promoted such external validation through challenges, external validation is the exception rather than the norm for health economic models. Addressing structural uncertainty is increasingly pertinent as the number of diabetes simulation models have grown substantially since the publication of the first model by Eastman et al. more than 2 decades ago. At least 33 diabetes models have been identified since 2000, and simulation models have evolved in complexity and vary in important ways. Therefore, validations should be redone each time the model structure is modified. Again, there may be a role for registries such as the Mount Hood Diabetes Model Registry to report the results of models undertaking specified external validations and to produce metrics that could be used to give greater weight to those models that are better able to replicate relevant real-world results.

Our results also provide some indications of the relative magnitude of structural uncertainty in comparison with parameter uncertainty. We examined the impact of varying levels of utility for key complications. The measurement of quality of life in health economics and in its application to diabetes has been a key focus of research. Although varying utility values had an impact on incremental QALYs within each of the models, the observed variation across models was much more substantial (Figure 3). This indicates that variations in the utility values (often tested in sensitivity analyses) contribute to a lesser degree compared with other aspects of model uncertainty captured as structural uncertainty. Importantly, there is limited investment in the development of transparent publicly available disease-specific models. For example, in diabetes, the overwhelming majority of diabetes models use risk equations from the UKPDS Outcomes Model. While there are requirements in health technology assessment process to use evidence from large clinicals trials, there has not been the same focus on investing in simulation models that can translate the results of randomized controlled trials into QALYs to facilitate evaluation and generate evidence for reimbursement and/or pricing decisions and research priorities. A value-of-information analysis may be a useful way to guide prioritization in research and development of future diabetes simulation models.

This study is subject to a number of limitations. First, as the challenge involved the participation of many modeling groups, simplification of the challenge instructions was needed to ensure all groups ran their simulations under the same challenge conditions. This included simplifications such as not allowing for biomarker evolution. This may have affected some models more than others, particularly those that link biomarker changes and health state transitions. In such models (ECHO-T2DM, MMD, and UKPDS-OM), if biomarker evolution was left active, it could result in greater changes to the incremental QALYs. In addition, the rates of hypoglycemia were not explicitly defined in the challenge instructions, and in some models (e.g., ECHO-T2DM and the Cardiff Model), this was an important driver. Second, not all modeling groups ran their simulations identically because of different interpretations of the challenge instructions. For instance, some groups (BRAVO, SPHR, CDC-RTI, MICADO, and MMD) varied their baseline utility value (without complications) in parallel with varying utility values associated with complications, while others kept this constant using the base value. These discrepancies did not appear to affect results systematically (Supplementary Material 10). Third, modeling groups were instructed to report only mean outcomes, and standard errors were not captured. Results across models may have substantial overlap, and this can be further investigated with future challenges. Fourth, the results presented here for the TTM modeling group are those presented at the challenge, which were based on incorrect input values. This preserves the spirit of the Mount Hood Challenges in exploring model results as they were originally presented and maintains the fidelity of discussions and conclusions as they occurred. In the interest of fairness, TTM was provided an opportunity to correct the simulations, and the results and corrected analysis are presented in the appendix (Supplementary Material 4). Although rankings for some models were affected, this difference was small, and it did not alter the conclusion that there is large variation across models.

SECTION: CONCL
Conclusion

This Quality-of-life Mount Hood Diabetes Challenge highlights the substantial variability in reported outcomes across 11 different diabetes simulation models. While much research has focused on obtaining appropriate sets of utility values to adequately describe health states, the results from this challenge demonstrated a greater need to understand and assess structural uncertainty, as the choice of models used to inform resource allocation decisions can matter. These are important considerations and should be an area of focus for future research. Finally, the choice of a specific model or model type alone does not reduce structural uncertainty or guarantee the most accurate model result for a specific analysis. Similar models (e.g., Markov) using the same data may produce vastly different results. Technical implementation of how the model is executed within a specific analysis will always be critical; the devil is in the details.

SECTION: SUPPL
Supplemental Material

Authors' Note: This work has been presented at the 2018 Mount Hood Diabetes Challenge Congress held in Dusseldorf, Germany.

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MW is employed by the Swedish Institute for Health Economics, which created and owns the ECHO-T2DM model and provides consulting services for its use. CA was previously employed by the Swedish Institute for Health Economics, which created and owns the ECHO-T2DM model and provides consulting services for its use.

AG is partly funded by the NIHR Biomedical Research Centre, Oxford, UK. ML and MR are employed by IQVIA, which created and owns the IQVIA Core Diabetes Model and provide consulting services for its use. HS and L Shi have ownership of the BRAVO diabetes model. L Si received grants from the National Health and Medical Research Council outside the submitted work. PC is partly funded by the NIHR Biomedical Research Centre, Oxford. UK. The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs: Michelle Tew https://orcid.org/0000-0003-3009-8056

Christian Asseburg https://orcid.org/0000-0001-7196-3363

An Tran-Duy https://orcid.org/0000-0003-0224-2858

Supplemental Material: Supplementary material for this article is available on the Medical Decision Making website
at http://journals.sagepub.com/home/mdm.