The so-called algorithms in the standard library distinguish C++ from other programming languages. Every major programming language has a set of containers, but in the traditional object-oriented approach, each container defines the operations that it permits, e.g., sorting, searching, and modifying. C++ turns object-oriented programming on its head and provides a set of function templates, called algorithms, that work with iterators, and therefore with almost any container.
The advantage of the C++ approach is that the library can contain a rich set of algorithms, and each algorithm can be written once and work with (almost) any kind of container. And when you define a custom container, it automatically works with the standard algorithms (assuming you implemented the container's iterators correctly). The set of algorithms is easily extensible without touching the container classes. Another benefit is that the algorithms work with iterators, not containers, so even non-container iterators (such as the stream iterators) can participate.
C++ algorithms have one disadvantage, however. Remember that iterators, like pointers, can be unsafe. Algorithms use iterators, and therefore are equally unsafe. Pass the wrong iterator to an algorithm, and the algorithm cannot detect the error and produces undefined behavior. Fortunately, most uses of algorithms make it easy to avoid programming errors.
Most of the standard algorithms are declared in the <algorithm>
header, with some numerical
algorithms in <numeric>
. Refer
to the respective sections of Chapter
13 for details.
The generic algorithms all work in a similar fashion. They are all
function templates, and most have one or more iterators as template
parameters. Because the algorithms are templates, you can instantiate
the function with any template arguments that meet the basic
requirements. For example, for_each
is declared as follows:
template<typename InIter, typename Function> Function for_each(InIter first, InIter last, Function func);
The names of the template parameters tell you what is expected
as template arguments: InIter
must
be an input iterator, and Function
must be a function pointer or functor. The documentation for for_each
further tells you that Function
must take one argument whose type
is the value_type
of InIter
. That's all. The InIter
argument can be anything that meets
the requirements of an input iterator. Notice that no container is
mentioned in the declaration or documentation of for_each
. For example, you can use an
istream_iterator
.
For a programmer trained in traditional object-oriented programming, the flexibility of the standard algorithms might seem strange or backwards. Thinking in terms of algorithms takes some adjustment.
For example, some object-oriented container classes define
sort
as a member function or as a
function that applies only to certain kinds of objects. (For example,
Java defines sort
only on arrays
and List
objects). If you have a
new kind of container, you must duplicate the implementation of
sort
or make sure the
implementation of your container maps to one of the standard
implementations of sort
. In C++,
you can invent any kind of crazy container, and as long as it supports
a random access iterator, you can use the standard sort
function.
Whenever you need to process the contents of a container, you
should think about how the standard algorithms can help you. For
example, suppose you need to read a stream of numbers into a data array. Typically, you
would set up a while
loop to read
the input stream and, for each number read, append the number to the
array. Now rethink the problem in terms of an algorithmic solution.
What you are actually doing is copying data from an input stream to an
array, so you could use the copy
algorithm:
std::copy(std::istream_iterator<double>(stream), std::istream_iterator<double>( ), std::back_inserter(data));
The copy
algorithm copies all
the items from one range to another. The input comes from an istream_iterator
, which is an iterator
interface for reading from an istream
. The output range is a back_insert_iterator
(created by the
back_inserter
function), which is
an output iterator that pushes items onto a container.
At first glance, the algorithmic solution doesn't seem any simpler or clearer than a straightforward loop:
double x; while (stream >> x) data.push_back(x);
More complex examples demonstrate the value of the C++
algorithms. For example, all major programming languages have a type
for character strings. They typically also have a function for finding
substrings. What about the more general problem of finding a subrange
in any larger range? Suppose a researcher is looking for patterns in a
data set and wants to see if a small data pattern occurs in a larger
data set. In C++, you can use the search
algorithm:
std::vector<double> data; ... if (std::search(data.begin( ), data.end( ), pattern.begin(), pattern.end( )) != data.end( )) { // found the pattern... }
A number of algorithms take a function pointer or functor (that
is, an object that overloads operator(
)
) as one of the arguments. The algorithms call the function
and possibly use the return value. For example, count_if
counts the number of times the
function returns a true (nonzero) result when applied to each element
in a range:
bool negative(double x) { return x < 0; } std::vector<double>::iterator::difference_type neg_cnt; std::vector<double> data; ... neg_cnt = std::count_if(data.begin( ), data.end( ), negative);
In spite of the unwieldy declaration for neg_cnt
, the application of count_if
to count the number of negative
items in the data
vector is easy to
write and read.
If you don't want to write a function to be used only with an
algorithm, you might be able to use the standard functors or function
objects (which are declared in the <functional>
header). For example, the
same count of negative values can be obtained with the
following:
std::vector<double>::iterator::difference_type neg_cnt; std::vector<double> data; ... neg_cnt = std::count_if(data.begin( ), data.end( ), std::bind2nd(std::less<double>, 0.0));
The std::less
class template defines operator( )
, so it takes two arguments and
applies operator<
to those
arguments. The bind2nd
function
template takes a two-argument functor and binds a constant value (in
this case 0.0
) as the second
argument, returning a one-argument function (which is what count_if
requires). The use of standard
function objects can make the code harder to read, but also helps
avoid writing one-off custom functions. (The Boost project expands and enhances the standard
library's binders. See Appendix B
for information about Boost.)
When using function objects, be very careful if those objects maintain state or have global side effects. Some algorithms copy the function objects, and you must be sure that the state is also properly copied. The numerical algorithms do not permit function objects that have side effects.
Example 10-8 shows
one use of a function object. It accumulates statistical data for
computing the mean and variance of a data set. Pass an instance of
Statistics
to the for_each
algorithm to accumulate the
statistics. The copy that is returned from for_each
contains the desired
results.
Example 10-8. Computing statistics with a functor
#include <algorithm> #include <cstddef> #include <cmath> #include <iostream> #include <ostream> template<typename T> class Statistics { public: typedef T value_type; Statistics( ) : n_(0), sum_(0), sumsq_(0) {} void operator( )(double x) { ++n_; sum_ += x; sumsq_ += x * x; } std::size_t count( ) const { return n_; } T sum( ) const { return sum_; } T sumsq( ) const { return sumsq_; } T mean( ) const { return sum_ / n_; } T variance( ) const { return (sumsq_ - sum_*sum_ / n_) / (n_ - 1); } private: std::size_t n_; T sum_; T sumsq_; // Sum of squares }; int main( ) { using namespace std; Statistics<double> stat = for_each( istream_iterator<double>(cin), istream_iterator<double>( ), Statistics<double>( )); cout << "count=" << stat.count( ) << '\n'; cout << "mean =" << stat.mean( ) << '\n'; cout << "var =" << stat.variance( ) << '\n'; cout << "stdev=" << sqrt(stat.variance( )) << '\n'; cout << "sum =" << stat.sum( ) << '\n'; cout << "sumsq=" << stat.sumsq( ) << '\n'; }
Chapter 13 describes all the algorithms in detail. This section presents a categorized summary of the algorithms.
It is always your responsibility to ensure that the output range is large enough to accommodate the input.
If the algorithm name ends with _if
, the final argument must be a
predicate, that is, a function pointer or
function object that returns a Boolean result (a result that is
convertible to type bool
).
The following algorithms compare objects or sequences (without modifying the elements):
equal
Determines whether two ranges have equivalent contents
lexicographical_compare
Determines whether one range is considered less than another range
max
Returns the maximum of two values
max_element
Finds the maximum value in a range
min
Returns the minimum of two values
min_element
Finds the minimum value in a range
mismatch
Finds the first position where two ranges differ
The following algorithms search for a value or a subsequence in a sequence (without modifying the elements):
adjacent_find
Finds the first position where an item is equal to its neighbor
find
Finds the first occurrence of a value in a range
find_end
Finds the last occurrence of a subsequence in a range
find_first_of
Finds the first position where a value matches any one item from a range of values
find_if
Finds the first position where a predicate returns true
search
, search_n
Finds a subsequence in a range
The following algorithms apply a binary search to a sorted sequence. The sequence typically comes from a sequence container in which you have already sorted the elements. You can use an associative containers, but they provide the last three functions as member functions, which might result in better performance.
The following algorithms modify a sequence:
copy
Copies an input range to an output range
copy_backward
Copies an input range to an output range, starting at the end of the output range
fill
, fill_n
Fills a range with a value
generate
, generate_n
Fills a range with values returned from a function
iter_swap
Swaps the values that two iterators point to
random_shuffle
Shuffles a range into random order
remove
Reorders a range to prepare to erase all elements equal to a given value
remove_copy
Copies a range, removing all items equal to a given value
remove_copy_if
Copies a range, removing all items for which a predicate returns true
remove_if
Reorders a range to prepare to erase all items for which a predicate returns true
replace
Replaces items of a given value with a new value
replace_copy
Copies a range, replacing items of a given value with a new value
replace_copy_if
Copies a range, replacing items for which a predicate returns true with a new value
replace_if
Replaces items for which a predicate returns true with a new value
reverse
Reverses a range in place
reverse_copy
Copies a range in reverse order
rotate
Rotates items from one end of a range to the other end
rotate_copy
Copies a range, rotating items from one end to the other
swap_ranges
Swaps values in two ranges
transform
Modifies every value in a range by applying a transformation function
unique
Reorders a range to prepare to erase all adjacent, duplicate items
unique_copy
The following algorithms are related to sorting and partitioning. You can supply a comparison function or
functor or rely on the default, which uses the <
operator.
nth_element
Finds the item that belongs at the n
th position (if the range were
sorted) and reorders the range to partition it into items less
than the n
th item and
items greater than or equal to the n
th item.
partial_sort
Reorders a range so the first part is sorted.
partial_sort_copy
Copies a range so the first part is sorted.
partition
Reorders a range so that all items for which a predicate is true come before all items for which the predicate is false.
sort
Sorts items in ascending order.
stable_partition
Reorders a range so that all items for which a predicate is true come before all items for which the predicate is false. The relative order of items within a partition is maintained.
stable_sort
Sorts items in ascending order. The relative order of equal items is maintained.
The following algorithms apply standard set operations to sorted sequences:
includes
Determines whether one sorted range is a subset of another
set_difference
Copies the set difference of two sorted ranges to an output range
set_intersection
Copies the intersection of two sorted ranges to an output range
set_symmetric_difference
Copies the symmetric difference of two sorted ranges to an output range
set_union
Copies the union of two sorted ranges to an output range
Writing your own algorithm is easy. Some care is always needed when writing function templates (as discussed in Chapter 7), but generic algorithms do not present any special or unusual challenges. Be sure you understand the requirements of the different categories of iterators and write your algorithm to use the most general category possible. You might even want to specialize your algorithm to improve its performance with some categories.
The first generic algorithm that most programmers will probably
write is copy_if
, which was
inexplicably omitted from the standard. The copy_if
function copies an input range to an output range,
copying only the values for which a predicate returns true (nonzero).
Example 10-9 shows a simple
implementation of copy_if
.
Example 10-9. One way to implement the copy_if function
template<typename InIter, typename OutIter, typename Pred> OutIter copy_if(InIter first, InIter last, OutIter result, Pred pred) { for (; first != last; ++first) if (pred(*first)) { *result = *first; ++result; } return result; }
You can also specialize an algorithm. For example, you might be
able to implement the algorithm more efficiently for a random access
iterator. In this case, you can write helper functions and use the
iterator_category
trait to choose a
specialized implementation. (Chapter
8 has more information about traits, including an example of
how to use iterator traits to optimize a function template.)
The real trick in designing and writing algorithms is being able to generalize the problem and then find an efficient solution. Before running off to write your own solution, check the standard library. Your problem might already have a solution.
For example, I recently wanted to write an algorithm to find the
median value in a range. There is no median
algorithm, but there is nth_element
, which solves the more general
problem of finding the element at any sorted index. Writing median
became a trivial matter of making a
temporary copy of the data, calling nth_element
, and then returning an iterator
that points to the median value in the original range. Because
median
makes two passes over the
input range, a forward iterator is required, as shown in Example 10-10.
Example 10-10. Finding the median of a range
template<typename FwdIter, typename Compare> FwdIter median(FwdIter first, FwdIter last, Compare compare) { typedef typename std::iterator_traits<FwdIter>::value_type value_type; std::vector<value_type> tmp(first, last); typename std::vector<value_type>::size_type median_pos = tmp.size( ) / 2; std::nth_element(tmp.begin( ), tmp.begin( ) + median_pos, tmp.end( ), compare); return std::find(first, last, tmp[median_pos]); }