Creating Reference Cycles and Leaking Memory is Safe
Rust makes a number of guarantees that we've talked about, for example that we'll never have a null value, and data races will be disallowed at compile time. Rust's memory safety guarantees make it more difficult to create memory that never gets cleaned up, which is known as a memory leak. Rust does not make memory leaks impossible, however, preventing memory leaks is not one of Rust's guarantees. In other words, memory leaks are memory safe.
By using Rc<T>
and RefCell<T>
, it is possible to create cycles of
references where items refer to each other in a cycle. This is bad because the
reference count of each item in the cycle will never reach 0, and the values
will never be dropped. Let's take a look at how that might happen and how to
prevent it.
In Listing 15-16, we're going to use another variation of the List
definition
from Listing 15-5. We're going back to storing an i32
value as the first
element in the Cons
variant. The second element in the Cons
variant is now
RefCell<Rc<List>>
: instead of being able to modify the i32
value this time,
we want to be able to modify which List
a Cons
variant is pointing to.
We've also added a tail
method to make it convenient for us to access the
second item, if we have a Cons
variant:
Filename: src/main.rs
#[derive(Debug)]
enum List {
Cons(i32, RefCell<Rc<List>>),
Nil,
}
impl List {
fn tail(&self) -> Option<&RefCell<Rc<List>>> {
match *self {
Cons(_, ref item) => Some(item),
Nil => None,
}
}
}
Next, in Listing 15-17, we're going to create a List
value in the variable
a
that initially is a list of 5, Nil
. Then we'll create a List
value in
the variable b
that is a list of the value 10 and then points to the list in
a
. Finally, we'll modify a
so that it points to b
instead of Nil
, which
will then create a cycle:
Filename: src/main.rs
# #[derive(Debug)]
# enum List {
# Cons(i32, RefCell<Rc<List>>),
# Nil,
# }
#
# impl List {
# fn tail(&self) -> Option<&RefCell<Rc<List>>> {
# match *self {
# Cons(_, ref item) => Some(item),
# Nil => None,
# }
# }
# }
#
use List::{Cons, Nil};
use std::rc::Rc;
use std::cell::RefCell;
fn main() {
let a = Rc::new(Cons(5, RefCell::new(Rc::new(Nil))));
println!("a initial rc count = {}", Rc::strong_count(&a));
println!("a next item = {:?}", a.tail());
let b = Rc::new(Cons(10, RefCell::new(a.clone())));
println!("a rc count after b creation = {}", Rc::strong_count(&a));
println!("b initial rc count = {}", Rc::strong_count(&b));
println!("b next item = {:?}", b.tail());
if let Some(ref link) = a.tail() {
*link.borrow_mut() = b.clone();
}
println!("b rc count after changing a = {}", Rc::strong_count(&b));
println!("a rc count after changing a = {}", Rc::strong_count(&a));
// Uncomment the next line to see that we have a cycle; it will
// overflow the stack
// println!("a next item = {:?}", a.tail());
}
We use the tail
method to get a reference to the RefCell
in a
, which we
put in the variable link
. Then we use the borrow_mut
method on the
RefCell
to change the value inside from an Rc
that holds a Nil
value to
the Rc
in b
. We've created a reference cycle that looks like Figure 15-18:
If you uncomment the last println!
, Rust will try and print this cycle out
with a
pointing to b
pointing to a
and so forth until it overflows the
stack.
Looking at the results of the println!
calls before the last one, we'll see
that the reference count of both a
and b
are 2 after we change a
to point
to b
. At the end of main
, Rust will try and drop b
first, which will
decrease the count of the Rc
by one. However, because a
is still
referencing that Rc
, its count is 1 rather than 0, so the memory the Rc
has
on the heap won't be dropped. It'll just sit there with a count of one,
forever. In this specific case, the program ends right away, so it's not a
problem, but in a more complex program that allocates lots of memory in a cycle
and holds onto it for a long time, this would be a problem. The program would
be using more memory than it needs to be, and might overwhelm the system and
cause it to run out of memory available to use.
Now, as you can see, creating reference cycles is difficult and inconvenient in
Rust. But it's not impossible: preventing memory leaks in the form of reference
cycles is not one of the guarantees Rust makes. If you have RefCell<T>
values
that contain Rc<T>
values or similar nested combinations of types with
interior mutability and reference counting, be aware that you'll have to ensure
that you don't create cycles. In the example in Listing 15-14, the solution
would probably be to not write code that could create cycles like this, since
we do want Cons
variants to own the list they point to.
With data structures like graphs, it's sometimes necessary to have references
that create cycles in order to have parent nodes point to their children and
children nodes point back in the opposite direction to their parents, for
example. If one of the directions is expressing ownership and the other isn't,
one way of being able to model the relationship of the data without creating
reference cycles and memory leaks is using Weak<T>
. Let's explore that next!
Prevent Reference Cycles: Turn an Rc<T>
into a Weak<T>
The Rust standard library provides Weak<T>
, a smart pointer type for use in
situations that have cycles of references but only one direction expresses
ownership. We've been showing how cloning an Rc<T>
increases the
strong_count
of references; Weak<T>
is a way to reference an Rc<T>
that
does not increment the strong_count
: instead it increments the weak_count
of references to an Rc
. When an Rc
goes out of scope, the inner value will
get dropped if the strong_count
is 0, even if the weak_count
is not 0. To
be able to get the value from a Weak<T>
, we first have to upgrade it to an
Option<Rc<T>>
by using the upgrade
method. The result of upgrading a
Weak<T>
will be Some
if the Rc
value has not been dropped yet, and None
if the Rc
value has been dropped. Because upgrade
returns an Option
, we
know Rust will make sure we handle both the Some
case and the None
case and
we won't be trying to use an invalid pointer.
Instead of the list in Listing 15-17 where each item knows only about the next item, let's say we want a tree where the items know about their children items and their parent items.
Let's start just with a struct named Node
that holds its own i32
value as
well as references to its children Node
values:
Filename: src/main.rs
use std::rc::Rc;
use std::cell::RefCell;
#[derive(Debug)]
struct Node {
value: i32,
children: RefCell<Vec<Rc<Node>>>,
}
We want to be able to have a Node
own its children, and we also want to be
able to have variables own each node so we can access them directly. That's why
the items in the Vec
are Rc<Node>
values. We want to be able to modify what
nodes are another node's children, so that's why we have a RefCell
in
children
around the Vec
. In Listing 15-19, let's create one instance of
Node
named leaf
with the value 3 and no children, and another instance
named branch
with the value 5 and leaf
as one of its children:
Filename: src/main.rs
fn main() {
let leaf = Rc::new(Node {
value: 3,
children: RefCell::new(vec![]),
});
let branch = Rc::new(Node {
value: 5,
children: RefCell::new(vec![leaf.clone()]),
});
}
The Node
in leaf
now has two owners: leaf
and branch
, since we clone
the Rc
in leaf
and store that in branch
. The Node
in branch
knows
it's related to leaf
since branch
has a reference to leaf
in
branch.children
. However, leaf
doesn't know that it's related to branch
,
and we'd like leaf
to know that branch
is its parent.
To do that, we're going to add a parent
field to our Node
struct
definition, but what should the type of parent
be? We know it can't contain
an Rc<T>
, since leaf.parent
would point to branch
and branch.children
contains a pointer to leaf
, which makes a reference cycle. Neither leaf
nor
branch
would get dropped since they would always refer to each other and
their reference counts would never be zero.
So instead of Rc
, we're going to make the type of parent
use Weak<T>
,
specifically a RefCell<Weak<Node>>
:
Filename: src/main.rs
use std::rc::{Rc, Weak};
use std::cell::RefCell;
#[derive(Debug)]
struct Node {
value: i32,
parent: RefCell<Weak<Node>>,
children: RefCell<Vec<Rc<Node>>>,
}
This way, a node will be able to refer to its parent node if it has one,
but it does not own its parent. A parent node will be dropped even if
it has child nodes referring to it, as long as it doesn't have a parent
node as well. Now let's update main
to look like Listing 15-20:
Filename: src/main.rs
fn main() {
let leaf = Rc::new(Node {
value: 3,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![]),
});
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
let branch = Rc::new(Node {
value: 5,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![leaf.clone()]),
});
*leaf.parent.borrow_mut() = Rc::downgrade(&branch);
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
}
Creating the leaf
node looks similar; since it starts out without a parent,
we create a new Weak
reference instance. When we try to get a reference to
the parent of leaf
by using the upgrade
method, we'll get a None
value,
as shown by the first println!
that outputs:
leaf parent = None
Similarly, branch
will also have a new Weak
reference, since branch
does
not have a parent node. We still make leaf
be one of the children of
branch
. Once we have a new Node
instance in branch
, we can modify leaf
to have a Weak
reference to branch
for its parent. We use the borrow_mut
method on the RefCell
in the parent
field of leaf
, then we use the
Rc::downgrade
function to create a Weak
reference to branch
from the Rc
in branch.
When we print out the parent of leaf
again, this time we'll get a Some
variant holding branch
. Also notice we don't get a cycle printed out that
eventually ends in a stack overflow like we did in Listing 15-14: the Weak
references are just printed as (Weak)
:
leaf parent = Some(Node { value: 5, parent: RefCell { value: (Weak) },
children: RefCell { value: [Node { value: 3, parent: RefCell { value: (Weak) },
children: RefCell { value: [] } }] } })
The fact that we don't get infinite output (or at least until the stack
overflows) is one way we can see that we don't have a reference cycle in this
case. Another way we can tell is by looking at the values we get from calling
Rc::strong_count
and Rc::weak_count
. In Listing 15-21, let's create a new
inner scope and move the creation of branch
in there, so that we can see what
happens when branch
is created and then dropped when it goes out of scope:
Filename: src/main.rs
fn main() {
let leaf = Rc::new(Node {
value: 3,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![]),
});
println!(
"leaf strong = {}, weak = {}",
Rc::strong_count(&leaf),
Rc::weak_count(&leaf),
);
{
let branch = Rc::new(Node {
value: 5,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![leaf.clone()]),
});
*leaf.parent.borrow_mut() = Rc::downgrade(&branch);
println!(
"branch strong = {}, weak = {}",
Rc::strong_count(&branch),
Rc::weak_count(&branch),
);
println!(
"leaf strong = {}, weak = {}",
Rc::strong_count(&leaf),
Rc::weak_count(&leaf),
);
}
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
println!(
"leaf strong = {}, weak = {}",
Rc::strong_count(&leaf),
Rc::weak_count(&leaf),
);
}
Right after creating leaf
, its strong count is 1 (for leaf
itself) and its
weak count is 0. In the inner scope, after we create branch
and associate
leaf
and branch
, branch
will have a strong count of 1 (for branch
itself) and a weak count of 1 (for leaf.parent
pointing to branch
with a
Weak<T>
). leaf
will have a strong count of 2, since branch
now has a
clone the Rc
of leaf
stored in branch.children
. leaf
still has a weak
count of 0.
When the inner scope ends, branch
goes out of scope, and its strong count
decreases to 0, so its Node
gets dropped. The weak count of 1 from
leaf.parent
has no bearing on whether Node
gets dropped or not, so we don't
have a memory leak!
If we try to access the parent of leaf
after the end of the scope, we'll get
None
again like we did before leaf
had a parent. At the end of the program,
leaf
has a strong count of 1 and a weak count of 0, since leaf
is now the
only thing pointing to it again.
All of the logic managing the counts and whether a value should be dropped or
not was managed by Rc
and Weak
and their implementations of the Drop
trait. By specifying that the relationship from a child to its parent should be
a Weak<T>
reference in the definition of Node
, we're able to have parent
nodes point to child nodes and vice versa without creating a reference cycle
and memory leaks.
Summary
We've now covered how you can use different kinds of smart pointers to choose
different guarantees and tradeoffs than those Rust makes with regular
references. Box<T>
has a known size and points to data allocated on the heap.
Rc<T>
keeps track of the number of references to data on the heap so that
data can have multiple owners. RefCell<T>
with its interior mutability gives
us a type that can be used where we need an immutable type, and enforces the
borrowing rules at runtime instead of at compile time.
We've also discussed the Deref
and Drop
traits that enable a lot of smart
pointers' functionality. We explored how it's possible to create a reference
cycle that would cause a memory leak, and how to prevent reference cycles by
using Weak<T>
.
If this chapter has piqued your interest and you now want to implement your own smart pointers, check out The Nomicon for even more useful information.
Next, let's talk about concurrency in Rust. We'll even learn about a few new smart pointers that can help us with it.