Optimisation of AND-OR Decision Trees

The Need for Optimisation

Decisions based on complex criteria, like a medical diagnosis or an ethical investment, often require a lot of effort. Answering each question involves gathering evidence, which can be expensive, invasive, or time-consuming. The goal is to reach a correct and confident conclusion by asking the fewest possible questions.

The andorR package provides a strategy for this by calculating an influence index for each unanswered question. This index quantifies how much a question, if answered, is likely to help resolve the entire tree. It provides a logical path to the conclusion, ensuring you’re always working on the most impactful piece of the puzzle. This process also embraces uncertainty, allowing you to provide answers with varying levels of confidence and to revisit uncertain answers later to strengthen the final result.

The Influence Index Algorithm

The optimisation process works in two stages. First, it assigns indices to the parent (non-leaf) nodes. Second, it uses those parent indices to calculate the final influence_index for each unanswered leaf (question).

1. Parent Node Indices: `true_index` and `false_index`

Every AND or OR node is assigned two values that represent the probability of an answer propagating up through that node. The calculation depends on the node’s logical rule and the number of its unanswered children ( $n$ ).

AND Nodes: For an AND node to become TRUE, all its children must be TRUE. Answering one child TRUE makes little progress. However, answering just one child FALSE immediately makes the AND node FALSE (this is called a short-circuit). Therefore:
- $false\_index = 1.0$ (high impact)
- $true\_index = 1 / n$ (impact is diluted by the other children)
OR Nodes: The logic is reversed. For an OR node to become TRUE, only one child needs to be TRUE. To make it FALSE, all children must be FALSE. Therefore:
- $true\_index = 1.0$ (high impact)
- $false\_index = 1 / n$ (impact is diluted)

2. Leaf Node `influence_index`

A leaf’s total influence is its potential to affect the final outcome. This is calculated by multiplying the indices of all its ancestors up to the root for both the TRUE and FALSE pathways, and then summing them.

The formula is: $Influence = (\prod \text{ancestor true_indices}) + (\prod \text{ancestor false_indices})$

This score represents the combined probability of the leaf’s answer propagating up to the root and helping to resolve the final conclusion. The leaf with the highest score is the most strategic question to investigate next, assuming one is neutral about the final outcome of the tree.

3. Prioritisation strategies in practice

The list of prioritised questions is calculated using the get_highest_influence() function. This takes an optional parameter sort_by which can have the following values:

“TRUE” :
- Sort by the influence the question has if a TRUE response is provided. This can be thought of as a rule-in strategy, minimising the number of questions required if the objective is to acheive a TRUE answer for the entire tree.
- If the influence_if_true of a question is equal to 1, then answering that single question with TRUE will resolve the entire tree with TRUE
“FALSE” :
- Sort by the influence the question has if a FALSE response is provided. This can be thought of as a rule-out strategy, minimising the number of questions required if the objective is to acheive a FALSE answer for the entire tree.
- If the influence_if_false of a question is equal to 1, then answering that single question with FALSE will resolve the entire tree with FALSE.
“BOTH” :
- The default. Sort by the sum of the two previous values, giving a summary measure of influence if we are neutral about the tree outcome.

The strategy used in the interactive analysis can also be set using the sort_by parameter.

A Worked Example

Let’s walk through the process using the ethical investment tree included with this package.

Step 1: Load and Prepare the Tree

We start by loading the package and the ethical dataset, then use load_tree_df() to build the tree object.

library(andorR)

# Load the built-in ethical dataset
data(ethical)
dtree <- load_tree_df(ethical)

Step 2: Calculate the Indices

The update_tree() function is a convenient wrapper that runs all the necessary calculations in the correct order: it calculates the logical state of the tree, assigns the true_index and false_index to all parent nodes, and then calculates the influence_index for all unanswered leaves.

# Calculate all indices for the initial state
dtree <- update_tree(dtree)

Step 3: Display and Interpret the Results

We can now print the tree to see the calculated indices. Notice that parent nodes have a true_index and false_index, while only the unanswered leaves have an influence_index.

print(dtree, "rule", true = "true_index", false = "false_index", influence = "influence_index")
#>                                    levelName rule      true     false influence
#> 1  Invest in Company X                        AND 0.2500000 1.0000000        NA
#> 2   ¦--Financial Viability                    AND 0.5000000 1.0000000        NA
#> 3   ¦   ¦--Profitability and Growth Signals    OR 1.0000000 0.3333333        NA
#> 4   ¦   ¦   ¦--FIN1                                      NA        NA 0.4583333
#> 5   ¦   ¦   ¦--FIN2                                      NA        NA 0.4583333
#> 6   ¦   ¦   °--FIN3                                      NA        NA 0.4583333
#> 7   ¦   °--Solvency and Stability             AND 0.5000000 1.0000000        NA
#> 8   ¦       ¦--FIN4                                      NA        NA 1.0625000
#> 9   ¦       °--FIN5                                      NA        NA 1.0625000
#> 10  ¦--Acceptable Environmental Stewardship    OR 1.0000000 0.5000000        NA
#> 11  ¦   ¦--Has a Clean Current Record         AND 0.3333333 1.0000000        NA
#> 12  ¦   ¦   ¦--ENV1                                      NA        NA 0.5833333
#> 13  ¦   ¦   ¦--ENV2                                      NA        NA 0.5833333
#> 14  ¦   ¦   °--ENV3                                      NA        NA 0.5833333
#> 15  ¦   °--Has a Credible Transition Pathway   OR 1.0000000 0.3333333        NA
#> 16  ¦       ¦--ENV4                                      NA        NA 0.4166667
#> 17  ¦       ¦--ENV5                                      NA        NA 0.4166667
#> 18  ¦       °--ENV6                                      NA        NA 0.4166667
#> 19  ¦--Demonstrable Social Responsibility      OR 1.0000000 0.5000000        NA
#> 20  ¦   ¦--Shows Excellent Internal Culture    OR 1.0000000 0.2500000        NA
#> 21  ¦   ¦   ¦--SOC1                                      NA        NA 0.3750000
#> 22  ¦   ¦   ¦--SOC2                                      NA        NA 0.3750000
#> 23  ¦   ¦   ¦--SOC3                                      NA        NA 0.3750000
#> 24  ¦   ¦   °--SOC4                                      NA        NA 0.3750000
#> 25  ¦   °--Has a Positive External Impact     AND 0.3333333 1.0000000        NA
#> 26  ¦       ¦--SOC5                                      NA        NA 0.5833333
#> 27  ¦       ¦--SOC6                                      NA        NA 0.5833333
#> 28  ¦       °--SOC7                                      NA        NA 0.5833333
#> 29  °--Strong Corporate Governance            AND 0.2000000 1.0000000        NA
#> 30      ¦--GOV1                                          NA        NA 1.0500000
#> 31      ¦--GOV2                                          NA        NA 1.0500000
#> 32      ¦--GOV3                                          NA        NA 1.0500000
#> 33      ¦--GOV4                                          NA        NA 1.0500000
#> 34      °--GOV5                                          NA        NA 1.0500000

To find the best question to ask first, we use get_highest_influence().

get_highest_influence(dtree, top_n = 3)
#>   name                                                    question
#> 1 FIN4         Debt-to-Equity ratio is below the industry average.
#> 2 FIN5       Company generates strong and positive free cash flow.
#> 3 GOV1 The board of directors is majority independent and diverse.
#>   influence_if_true influence_if_false influence_index
#> 1            0.0625                  1          1.0625
#> 2            0.0625                  1          1.0625
#> 3            0.0500                  1          1.0500

In this initial state, FIN4, FIN5, and GOV1 (and other leaves under AND nodes) have the highest influence. This is because they belong to AND groups directly under the root. A single FALSE answer to any of them has a high probability of making a major branch of the tree FALSE, thus contributing significantly to the final answer. The tool correctly identifies these as the most efficient questions to investigate first.

Step 4: Observing Dynamic Re-Prioritization

Let’s focus on some of the Financial questions:

# Get a full list of all question statuses
all_questions <- get_questions(dtree)

# Let's look at the initial influence of FIN1, FIN2, and FIN4
all_questions %>% 
  filter(name %in% c("FIN1", "FIN2", "FIN4")) %>%
  knitr::kable()

name	question	answer	confidence	influence_if_true	influence_if_false	influence_index
FIN1	Company demonstrates consistent, high revenue growth.	NA	NA	0.1250	0.3333333	0.4583333
FIN2	Company maintains a high net profit margin for its industry.	NA	NA	0.1250	0.3333333	0.4583333
FIN4	Debt-to-Equity ratio is below the industry average.	NA	NA	0.0625	1.0000000	1.0625000

Now, let’s provide an answer and see how the priorities change. We will answer FIN1 (which is under an OR node) as TRUE. This should resolve its parent node, “Profitability and Growth Signals”.

# Answer a question and then run update_tree() again
set_answer(dtree, "FIN1", TRUE, 5)
#> Answer for leaf 'FIN1' set to: TRUE with confidence 5/5
dtree <- update_tree(dtree)

Now that the tree has been recalculated, let’s check the influence scores for the same leaves again.

# Get the new list of all question statuses
all_questions_new <- get_questions(dtree)

# Look at the new influence of FIN2 and FIN4
all_questions_new %>% 
  filter(name %in% c("FIN1", "FIN2", "FIN4")) %>%
  knitr::kable()

name	question	answer	confidence	influence_if_true	influence_if_false	influence_index
FIN1	Company demonstrates consistent, high revenue growth.	TRUE	5	NA	NA	NA
FIN2	Company maintains a high net profit margin for its industry.	NA	NA	NA	NA	NA
FIN4	Debt-to-Equity ratio is below the industry average.	NA	NA	0.125	1	1.125

Analysis of the Change

FIN2: The influence index for FIN2 is now NA. This is because its parent node (“Profitability and Growth Signals”) was an OR node. Since we answered its sibling FIN1 as TRUE, the parent node is now resolved as TRUE. Answering FIN2 can no longer change the outcome of that branch, so its influence has correctly become irrelevant.
FIN4: Conversely, the influence index for FIN4 has increased. Because the “Profitability” branch is now resolved, the final outcome of the higher-level “Financial Viability” AND node now depends entirely on resolving the “Solvency and Stability” branch. This makes the questions under it, like FIN4, more critical than they were before.

This dynamic re-prioritization is the core feature of the andorR optimisation algorithm.

Next Steps: Boosting Confidence

Once enough questions have been answered to reach a TRUE or FALSE conclusion at the root, the focus of the analysis shifts. The goal is no longer to find the answer, but to increase the confidence in the answer you have. andorR provides a separate function for this phase of the analysis.

For a detailed guide, see the “Confidence Boosting and Sensitivity Analysis” vignette: vignette("confidence-boosting", package = "andorR") ```