interpretable random forests
Random forests are powerful but often opaque. The standard “variable importance” plot — showing mean decrease in accuracy or node impurity per predictor — summarizes the entire forest at once. But what happens as the forest grows?
I wanted to see variable importance evolve tree by tree:
This kind of animation reveals something important: how quickly variable importance rankings stabilize. If rankings settle early, the model is interpretable and robust. If they keep shifting, the model may be unreliable for drawing scientific conclusions.
In the example above, one predictor clearly dominates, so the ranking stabilizes fast. But with more evenly matched predictors, the randomness of bagging and feature selection at each split could cause the top-ranked variable to fluctuate — both as a single forest grows and across independently trained forests.
Ranking stability has been studied in bioinformatics 1 and remote sensing 2, but it deserves wider attention anywhere random forests are used for inference rather than pure prediction.
Below is a minimal R example to reproduce an animation like this:
A natural extension would be to plot out-of-bag error alongside importance as trees accumulate. I’d expect both to stabilize together.