Optimisation of AND-OR Decision Trees
tree-optimisation.Rmd
The Need for Optimisation
Decisions based on complex criteria, like a medical diagnosis or an ethical investment, often require a lot of effort. Answering each question involves gathering evidence, which can be expensive, invasive, or time-consuming. The goal is to reach a correct and confident conclusion by asking the fewest possible questions.
The andorR
package provides a strategy for this by
calculating an influence index for each unanswered
question. This index quantifies how much a question, if answered, is
likely to help resolve the entire tree. It provides a logical path to
the conclusion, ensuring you’re always working on the most impactful
piece of the puzzle. This process also embraces
uncertainty, allowing you to provide answers with
varying levels of confidence and to revisit uncertain answers later to
strengthen the final result.
The Influence Index Algorithm
The optimisation process works in two stages. First, it assigns
indices to the parent (non-leaf) nodes. Second, it uses those parent
indices to calculate the final influence_index
for each
unanswered leaf (question).
1. Parent Node Indices: true_index
and
false_index
Every AND
or OR
node is assigned two values
that represent the probability of an answer propagating up through that
node. The calculation depends on the node’s logical rule and the number
of its unanswered children
().
-
AND Nodes: For an
AND
node to becomeTRUE
, all its children must beTRUE
. Answering one childTRUE
makes little progress. However, answering just one childFALSE
immediately makes theAND
nodeFALSE
(this is called a short-circuit). Therefore:- (high impact)
- (impact is diluted by the other children)
-
OR Nodes: The logic is reversed. For an
OR
node to becomeTRUE
, only one child needs to beTRUE
. To make itFALSE
, all children must beFALSE
. Therefore:- (high impact)
- (impact is diluted)
2. Leaf Node influence_index
A leaf’s total influence is its potential to affect the final
outcome. This is calculated by multiplying the indices of all its
ancestors up to the root for both the TRUE
and
FALSE
pathways, and then summing them.
The formula is:
This score represents the combined probability of the leaf’s answer propagating up to the root and helping to resolve the final conclusion. The leaf with the highest score is the most strategic question to investigate next, assuming one is neutral about the final outcome of the tree.
3. Prioritisation strategies in practice
The list of prioritised questions is calculated using the
get_highest_influence()
function. This takes an optional
parameter sort_by
which can have the following values:
- “TRUE” :
- Sort by the influence the question has if a TRUE response is provided. This can be thought of as a rule-in strategy, minimising the number of questions required if the objective is to acheive a TRUE answer for the entire tree.
- If the
influence_if_true
of a question is equal to 1, then answering that single question with TRUE will resolve the entire tree with TRUE
- “FALSE” :
- Sort by the influence the question has if a FALSE response is provided. This can be thought of as a rule-out strategy, minimising the number of questions required if the objective is to acheive a FALSE answer for the entire tree.
- If the
influence_if_false
of a question is equal to 1, then answering that single question with FALSE will resolve the entire tree with FALSE.
- “BOTH” :
- The default. Sort by the sum of the two previous values, giving a summary measure of influence if we are neutral about the tree outcome.
The strategy used in the interactive analysis can also be set using
the sort_by
parameter.
A Worked Example
Let’s walk through the process using the ethical
investment tree included with this package.
Step 1: Load and Prepare the Tree
We start by loading the package and the ethical
dataset,
then use load_tree_df()
to build the tree object.
library(andorR)
# Load the built-in ethical dataset
data(ethical)
dtree <- load_tree_df(ethical)
Step 2: Calculate the Indices
The update_tree()
function is a convenient wrapper that
runs all the necessary calculations in the correct order: it calculates
the logical state of the tree, assigns the true_index
and
false_index
to all parent nodes, and then calculates the
influence_index
for all unanswered leaves.
# Calculate all indices for the initial state
dtree <- update_tree(dtree)
Step 3: Display and Interpret the Results
We can now print the tree to see the calculated indices. Notice that
parent nodes have a true_index
and
false_index
, while only the unanswered leaves have an
influence_index
.
print(dtree, "rule", true = "true_index", false = "false_index", influence = "influence_index")
#> levelName rule true false influence
#> 1 Invest in Company X AND 0.2500000 1.0000000 NA
#> 2 ¦--Financial Viability AND 0.5000000 1.0000000 NA
#> 3 ¦ ¦--Profitability and Growth Signals OR 1.0000000 0.3333333 NA
#> 4 ¦ ¦ ¦--FIN1 NA NA 0.4583333
#> 5 ¦ ¦ ¦--FIN2 NA NA 0.4583333
#> 6 ¦ ¦ °--FIN3 NA NA 0.4583333
#> 7 ¦ °--Solvency and Stability AND 0.5000000 1.0000000 NA
#> 8 ¦ ¦--FIN4 NA NA 1.0625000
#> 9 ¦ °--FIN5 NA NA 1.0625000
#> 10 ¦--Acceptable Environmental Stewardship OR 1.0000000 0.5000000 NA
#> 11 ¦ ¦--Has a Clean Current Record AND 0.3333333 1.0000000 NA
#> 12 ¦ ¦ ¦--ENV1 NA NA 0.5833333
#> 13 ¦ ¦ ¦--ENV2 NA NA 0.5833333
#> 14 ¦ ¦ °--ENV3 NA NA 0.5833333
#> 15 ¦ °--Has a Credible Transition Pathway OR 1.0000000 0.3333333 NA
#> 16 ¦ ¦--ENV4 NA NA 0.4166667
#> 17 ¦ ¦--ENV5 NA NA 0.4166667
#> 18 ¦ °--ENV6 NA NA 0.4166667
#> 19 ¦--Demonstrable Social Responsibility OR 1.0000000 0.5000000 NA
#> 20 ¦ ¦--Shows Excellent Internal Culture OR 1.0000000 0.2500000 NA
#> 21 ¦ ¦ ¦--SOC1 NA NA 0.3750000
#> 22 ¦ ¦ ¦--SOC2 NA NA 0.3750000
#> 23 ¦ ¦ ¦--SOC3 NA NA 0.3750000
#> 24 ¦ ¦ °--SOC4 NA NA 0.3750000
#> 25 ¦ °--Has a Positive External Impact AND 0.3333333 1.0000000 NA
#> 26 ¦ ¦--SOC5 NA NA 0.5833333
#> 27 ¦ ¦--SOC6 NA NA 0.5833333
#> 28 ¦ °--SOC7 NA NA 0.5833333
#> 29 °--Strong Corporate Governance AND 0.2000000 1.0000000 NA
#> 30 ¦--GOV1 NA NA 1.0500000
#> 31 ¦--GOV2 NA NA 1.0500000
#> 32 ¦--GOV3 NA NA 1.0500000
#> 33 ¦--GOV4 NA NA 1.0500000
#> 34 °--GOV5 NA NA 1.0500000
To find the best question to ask first, we use
get_highest_influence()
.
get_highest_influence(dtree, top_n = 3)
#> name question
#> 1 FIN4 Debt-to-Equity ratio is below the industry average.
#> 2 FIN5 Company generates strong and positive free cash flow.
#> 3 GOV1 The board of directors is majority independent and diverse.
#> influence_if_true influence_if_false influence_index
#> 1 0.0625 1 1.0625
#> 2 0.0625 1 1.0625
#> 3 0.0500 1 1.0500
In this initial state, FIN4, FIN5,
and GOV1 (and other leaves under AND
nodes) have the highest influence. This is because they belong to
AND
groups directly under the root. A single
FALSE
answer to any of them has a high probability of
making a major branch of the tree FALSE
, thus contributing
significantly to the final answer. The tool correctly identifies these
as the most efficient questions to investigate first.
Step 4: Observing Dynamic Re-Prioritization
Let’s focus on some of the Financial questions:
# Get a full list of all question statuses
all_questions <- get_questions(dtree)
# Let's look at the initial influence of FIN1, FIN2, and FIN4
all_questions %>%
filter(name %in% c("FIN1", "FIN2", "FIN4")) %>%
knitr::kable()
name | question | answer | confidence | influence_if_true | influence_if_false | influence_index |
---|---|---|---|---|---|---|
FIN1 | Company demonstrates consistent, high revenue growth. | NA | NA | 0.1250 | 0.3333333 | 0.4583333 |
FIN2 | Company maintains a high net profit margin for its industry. | NA | NA | 0.1250 | 0.3333333 | 0.4583333 |
FIN4 | Debt-to-Equity ratio is below the industry average. | NA | NA | 0.0625 | 1.0000000 | 1.0625000 |
Now, let’s provide an answer and see how the priorities change. We
will answer FIN1
(which is under an OR
node)
as TRUE
. This should resolve its parent node,
“Profitability and Growth Signals”.
# Answer a question and then run update_tree() again
set_answer(dtree, "FIN1", TRUE, 5)
#> Answer for leaf 'FIN1' set to: TRUE with confidence 5/5
dtree <- update_tree(dtree)
Now that the tree has been recalculated, let’s check the influence scores for the same leaves again.
# Get the new list of all question statuses
all_questions_new <- get_questions(dtree)
# Look at the new influence of FIN2 and FIN4
all_questions_new %>%
filter(name %in% c("FIN1", "FIN2", "FIN4")) %>%
knitr::kable()
name | question | answer | confidence | influence_if_true | influence_if_false | influence_index |
---|---|---|---|---|---|---|
FIN1 | Company demonstrates consistent, high revenue growth. | TRUE | 5 | NA | NA | NA |
FIN2 | Company maintains a high net profit margin for its industry. | NA | NA | NA | NA | NA |
FIN4 | Debt-to-Equity ratio is below the industry average. | NA | NA | 0.125 | 1 | 1.125 |
Analysis of the Change
FIN2: The influence index for
FIN2
is nowNA
. This is because its parent node (“Profitability and Growth Signals”) was anOR
node. Since we answered its siblingFIN1
asTRUE
, the parent node is now resolved asTRUE
. AnsweringFIN2
can no longer change the outcome of that branch, so its influence has correctly become irrelevant.FIN4: Conversely, the influence index for
FIN4
has increased. Because the “Profitability” branch is now resolved, the final outcome of the higher-level “Financial Viability”AND
node now depends entirely on resolving the “Solvency and Stability” branch. This makes the questions under it, likeFIN4
, more critical than they were before.
This dynamic re-prioritization is the core feature of the
andorR
optimisation algorithm.
Next Steps: Boosting Confidence
Once enough questions have been answered to reach a TRUE
or FALSE
conclusion at the root, the focus of the analysis
shifts. The goal is no longer to find the answer, but to increase the
confidence in the answer you have. andorR
provides a separate function for this phase of the analysis.
For a detailed guide, see the “Confidence Boosting and Sensitivity
Analysis” vignette:
vignette("confidence-boosting", package = "andorR")
```