Improving our I/O Project
In our I/O project implementing grep
in the last chapter, there are some
places where the code could be made clearer and more concise using iterators.
Let's take a look at how iterators can improve our implementation of the
Config::new
function and the grep
function.
Removing a clone
by Using an Iterator
Back in listing 12-8, we had this code that took a slice of String
values and
created an instance of the Config
struct by checking for the right number of
arguments, indexing into the slice, and cloning the values so that the Config
struct could own those values:
impl Config {
fn new(args: &[String]) -> Result<Config, &'static str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let search = args[1].clone();
let filename = args[2].clone();
Ok(Config {
search: search,
filename: filename,
})
}
}
At the time, we said not to worry about the clone
calls here, and that we
could remove them in the future. Well, that time is now! So, why do we need
clone
here? The issue is that we have a slice with String
elements in the
parameter args
, and the new
function does not own args
. In order to be
able to return ownership of a Config
instance, we need to clone the values
that we put in the search
and filename
fields of Config
, so that the
Config
instance can own its values.
Now that we know more about iterators, we can change the new
function to
instead take ownership of an iterator as its argument. We'll use the iterator
functionality instead of having to check the length of the slice and index into
specific locations. Since we've taken ownership of the iterator, and we won't be
using indexing operations that borrow anymore, we can move the String
values
from the iterator into Config
instead of calling clone
and making a new
allocation.
First, let's take main
as it was in Listing 12-6, and change it to pass the
return value of env::args
to Config::new
, instead of calling collect
and
passing a slice:
fn main() {
let config = Config::new(env::args());
// ...snip...
If we look in the standard library documentation for the env::args
function,
we'll see that its return type is std::env::Args
. So next we'll update the
signature of the Config::new
function so that the parameter args
has the
type std::env::Args
instead of &[String]
:
impl Config {
fn new(args: std::env::Args) -> Result<Config, &'static str> {
// ...snip...
Next, we'll fix the body of Config::new
. As we can also see in the standard
library documentation, std::env::Args
implements the Iterator
trait, so we
know we can call the next
method on it! Here's the new code:
# struct Config {
# search: String,
# filename: String,
# }
#
impl Config {
fn new(mut args: std::env::Args) -> Result<Config, &'static str> {
args.next();
let search = match args.next() {
Some(arg) => arg,
None => return Err("Didn't get a search string"),
};
let filename = match args.next() {
Some(arg) => arg,
None => return Err("Didn't get a file name"),
};
Ok(Config {
search: search,
filename: filename,
})
}
}
Remember that the first value in the return value of env::args
is the name of
the program. We want to ignore that, so first we'll call next
and not do
anything with the return value. The second time we call next
should be the
value we want to put in the search
field of Config
. We use a match
to
extract the value if next
returns a Some
, and we return early with an Err
value if there weren't enough arguments (which would cause this call to next
to return None
).
We do the same thing for the filename
value. It's slightly unfortunate that
the match
expressions for search
and filename
are so similar. It would be
nice if we could use ?
on the Option
returned from next
, but ?
only
works with Result
values currently. Even if we could use ?
on Option
like
we can on Result
, the value we would get would be borrowed, and we want to
move the String
from the iterator into Config
.
Making Code Clearer with Iterator Adaptors
The other bit of code where we could take advantage of iterators was in the
grep
function as implemented in Listing 12-15:
fn grep<'a>(search: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(search) {
results.push(line);
}
}
results
}
We can write this code in a much shorter way, and avoiding having to have a
mutable intermediate results
vector, by using iterator adaptor methods like
this instead:
fn grep<'a>(search: &str, contents: &'a str) -> Vec<&'a str> {
contents.lines()
.filter(|line| line.contains(search))
.collect()
}
Here, we use the filter
adaptor to only keep the lines that
line.contains(search)
returns true for. We then collect them up into another
vector with collect
. Much simpler!
We can use the same technique in the grep_case_insensitive
function that we
defined in Listing 12-16 as follows:
fn grep_case_insensitive<'a>(search: &str, contents: &'a str) -> Vec<&'a str> {
let search = search.to_lowercase();
contents.lines()
.filter(|line| {
line.to_lowercase().contains(&search)
}).collect()
}
Not too bad! So which style should you choose? Most Rust programmers prefer to use the iterator style. It's a bit tougher to understand at first, but once you gain an intuition for what the various iterator adaptors do, this is much easier to understand. Instead of fiddling with the various bits of looping and building a new vector, the code focuses on the high-level objective of the loop, abstracting some of the commonplace code so that it's easier to see the concepts that are unique to this usage of the code, like the condition on which the code is filtering each element in the iterator.
But are they truly equivalent? Surely the more low-level loop will be faster. Let's talk about performance.