6 Tips to supercharge C++11 vector performance

Vector is like the swiss army knife of C++ STL containers. In the words of Bjarne Stroutsoup – “By default, use Vector when you need a container”. For mere mortals like us, we take this as gospel and just run with it. However, Vector is just a tool and like any tool, it can be used both effectively or ineffectively.

In this article we’ll look at 6 ways to optimize usage of vectors. We’ll look at both efficient and inefficient ways to perform the most common programming tasks using vectors, measure the performance gain we obtain by using vectors efficiently and try to understand why we’re getting the performance gain.

Infrastructure and methodology for the performance tests:

  • All the tests are performed on my Surface Book having a core i7 @2.6 Ghz processor, 8 GB RAM and VS2015 C++ Compiler running under Windows 10.

  • We’ll be using the Stopwatch made by Kjell available at https://github.com/KjellKod/Stopwatch.

  • We’ll be running each test 100 times and taking the average runtime for comparison purposes.The actual code used to run the test is available here. Feel free to download it for profiling vector performance on your system.The code snippets in the post will reflect just one iteration to keep things simple.

  • We’re using a TestStruct and a FillVector() method to populate the test vectors. They are defined below.

// Test struct to be inserted/removed from vector
struct BigTestStruct
  int iValue = 1;
  float fValue;
  long lValue;
  double dValue;
  char cNameArr[10];
  int iValArr[100];

// Helper function to populate the test vectors
void FillVector(vector<BigTestStruct>& testVector)
  for (int i = 0; i < 10000; i++)
    BigTestStruct bt;

So without further delay, here are the 6 quick recipes to optimize your usage of C++ 11 vectors.

#1 Avoid unnecessary reallocate and copy cycles by reserving the size of vector ahead of time.

Programmers like vectors because they can just add items to the container without having to worry about the size of the container ahead of time. However, just starting with a vector of capacity 0 and adding to it as elements come in can cost you quite a lot of runtime performance. If you know ahead of time, how big your vector can get, it is worth reserving the size ahead of time.

 Here’s a simple test where we’ll push 10k instances of a test struct onto a vector – first without reserving the size and then after reserving the size.

vector<BigTestStruct> testVector1;
vector<BigTestStruct> testVector2;

cout << "Time to Fill Vector Without Reservation:" << sw.ElapsedUs() << endl;

cout << "Time to Fill Vector With Reservation:" << sw.ElapsedUs() << endl;

The case where the size is not reserved ahead of time takes 5145 micro seconds (us) on my computer while reserving ahead of time takes only 1279 us . That’s a performance gain of 75.14% !!!

The reason behind this is explained best by Scott Meyers in his book Effective STL-50 Specific Ways to Improve Your Use of the Standard Template Library:

“For vector and string, growth is handled by doing the moral equivalent of a realloc whenever more space is needed. This realloc-like operation has four parts:

 1. Allocate a new block of memory that is some multiple of the container’s current capacity. In most implementations, vector and string capacities grow by a factor of between 1.5 and 2 each time.

2. Copy all the elements from the container’s old memory into its new memory.

3. Destroy the objects in the old memory.

4. Deallocate the old memory.

Given all that allocation, deallocation, copying, and destruction, it should not stun you to learn that these steps can be expensive. Naturally, you don’t want to perform them any more frequently than you have to. If that doesn’t strike you as natural, perhaps it will when you consider that each time these steps occur, all iterators, pointers, and references into the vector or string are invalidated. That means that the simple act of inserting an element into a vector or string may also require updating other data structures that use iterators, pointers, or references into the vector or string being expanded.”

#2 Use shrink_to_fit() to release memory consumed by the vector – clear() or erase() does not release memory.

Contrary to popular belief, removing the elements from a vector via the erase() or clear() methods does not release the memory allocated by the vector. Let’s run a simple experiment to prove this. We’ll add 100 elements to a vector and call clear() and erase() on the vector. Then we’ll check using the capacity() function to tell us how many elements the container can hold in the memory it has already allocated.

  size_t capacity = testVector1.capacity();
  cout << "Capacity Before Erasing Elements:" << capacity << endl;
  testVector1.erase(testVector1.begin(), testVector1.begin() + 3); //
  capacity = testVector1.capacity();
  cout << "Capacity After Erasing 3 elements Elements:" << capacity << endl;

  capacity = testVector1.capacity();
  cout << "Capacity After clearing all emements:" << capacity << endl;

  capacity = testVector1.capacity();
  cout << "Capacity After shrinking the Vector:" << capacity << endl;

The output is given below:

Capacity Before Erasing Elements:12138

Capacity After Erasing 3 elements Elements:12138

Capacity After clearing all emements:12138

Capacity After shrinking the Vector:0

As you can see from the above output, erase() or clear() does nothing to reduce the memory occupied by a vector. So once you reach a point in your code where the vector is no longer required, use the std::vector::shrink_to_fit() method to release the memory.

Please note that the shrink_to_fit() might not be implemented by all compiler vendors. In that case, use the “Swap idiom” to clear out the vector as follows:

container<T>( c ).swap( c ); // the shrink-to-fit idiom to shed excess capacity

container<T>().swap( c );    // the idiom to shed all contents and capacity

If you’re interested, you can check C++ Coding Standards: 101 Rules, Guidelines, and Best Practices, item # 82 for details of the swap idiom.


#3 When filling up or copying into a vector, prefer assignment over insert() or push_back().

There are three popular ways of filling up a vector from another vector – assigning the old vector to the new one, using the iterator based std::vector::insert() or using a loop based std::vector::push_back(). Each of the three ways are shown in the code below:

  vector<BigTestStruct> sourceVector, destinationVector;

  // Assign sourceVector to destination vector
  destinationVector = sourceVector;

  cout << "Assigning Vector :" << sw.ElapsedUs() << endl;

  //Using std::vector::insert()
  vector<BigTestStruct> sourceVector1, destinationVector1;

  cout << "Using insert() :" << sw.ElapsedUs() << endl;

  //Using push_back()
  vector<BigTestStruct> sourceVector2, destinationVector2;

  for (unsigned i = 0; i < sourceVector2.size(); ++i)
  cout << "Using push_back :" << sw.ElapsedUs() << endl;

And here’s the relative performance of each:

Assignment: 589.54 us

Insert(): 1321.27 us

Push_back(): 5354.70 us

So we can see that vector assignment is 55.38% faster than Insert() and  89% faster than push_back().

The question is Why ???

Assignment is very efficient because it knows the size of the vector it is copying, and needs to call the memory manager only once to create the assigned vector’s internal buffer.

So, to fill up a vector efficiently, try assignment, insert() with iterators from another container, and push_back(), in that order. Of course, if you have to copy from another type of container into a vector, assignment is not an option. In this case, you’d want to do an iterator based insert.

#4 While iterating through elements in a std::vector, avoid the std::vector::at() function.

There are three ways of iterating through a vector:

  1. Using an iterator
  2. Using the std::vector::at() member function
  3. Using the subscripting – [ ] notation

The usage for each is shown below:

  //Using an iterator
  vector<BigTestStruct> testVectorSum;

  int sum = 0;

  for (auto it = testVectorSum.begin(); it != testVectorSum.end(); ++it)
    sum = sum + it->iValue;
  cout << "Using Iterator:" << sw.ElapsedUs() << endl;

  //Using the at() member function
  sum = 0;

  for (unsigned i = 0; i < testVectorSum.size(); ++i)
    sum = sum + testVectorSum.at(i).iValue;

  cout << "Using at() :" << sw.ElapsedUs() << endl;

  // Using the subscript notation
  sum = 0;
  for (unsigned i = 0; i < testVectorSum.size(); ++i)
    sum = sum + testVectorSum[i].iValue;

  cout << "Using subscripting:" << sw.ElapsedUs() << endl;

The output for the program is as follows:

Using Iterator:0

Using at() :3.73

Using subscripting:0

As we can see, the std::vector::at() function is the slowest of three ways to access vector elements.

#5 Try to avoid inserting an element in front of the vector.

Any insert at the front of a vector is an O(n) operation. Inserting at the front is inefficient because every item in the vector must be copied to make room for the new entry. If you need to continually insert at the beginning of the vector, you should probably re-evaluate your overall design.

Just for fun, here’s a comparison of inserting in the front of a std::vector vs inserting in the front of a std::list.

vector<BigTestStruct> sourceVector3, pushFrontTestVector;

list<BigTestStruct> pushFrontTestList;

//Push 100k elements in front of the new vector -- this is horrible code !!! 
for (unsigned i = 1; i < sourceVector3.size(); ++i)
  pushFrontTestVector.insert(pushFrontTestVector.begin(), sourceVector3[i]);
cout << "Pushing in front of Vector :" << sw.ElapsedUs() << endl;

// push in front of a list
for (unsigned i = 0; i < sourceVector3.size(); ++i)
cout << "Pushing in front of list :" << sw.ElapsedUs() << endl;

If i run this test 10 times on avector having 1000 elements , the output is given below.

Average of Pushing in front of Vector :11999.4

Average of Pushing in front of list :20.36

Inserting at the front of a list is approximately 58836% faster than inserting at the front of a vector . No surprises there because inserting at the head of a list is a O(1) operation. Of course, the bigger the vector, the worse the performance number becomes.

#6 Prefer emplace_back() instead of push_back() while inserting into a vector.

Almost everyone who jumped onto the C++11 bandwagon unequivocally agrees that emplacement is favorable to insertion for STL containers. Theoretically, emplacement is supposed to be at least as efficient as insertion. However, for all practical purposes , sometimes the difference in performance is negligible.

Consider the code snippet below:

vector<BigTestStruct> sourceVector4, pushBackTestVector, emplaceBackTestVector;

//Test push back performance

for (unsigned i = 0; i < sourceVector4.size(); ++i)

cout << "Using push_back :" << sw.ElapsedUs() << endl;

//Test emplace_back()

for (unsigned i = 0; i < sourceVector4.size(); ++i)

cout << "Using emplace_back :" << sw.ElapsedUs() << endl;

If I run this 100 times , the following output is generated:

Average Using push_back :5431.58

Average Using emplace_back :5254.64

We can clearly see that the emplacement function is outperforming the insertion function – but only by 177 micro seconds. For all intents and purposes, they are roughly equivalent.

Emplacement functions are likely to be significantly faster only in the following cases:

  1. The value being added is constructed into the vector, not assigned.
  2. the argument type(s) passed differ from the type held by the vector. For example, if a vector contains std::string but we pass a string literal to the vector.

Even if the above two conditions don’t hold true, you’ll not loose much by using emplacement over insertion as demonstrated in this example.

For more details on emplacement vs. insertion, please check out item # 42 in Scott Meyer’s Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14 .

Final Thoughts

Just like any third party data, you should not blindly rely on the results and suggestions provided here. You can experience a lot of varience while testing on different operating systems, processor architectures, and compiler setc. Measure for yourself and go from there.

Please share if you liked the article 🙂



  • Andrew Male

    Regarding #3, would you alleviate any performance problems for the non-assignment methods if you resized your destination vector before the copy operation?

    • push_back() performance improves significantly – test results for insert() and push_back() with reservation below:

      Average of Assigning Vector :558.95
      Average of Using insert() :1225.14
      Average of Using push_back :1241.96

      • Andrew Male

        Interesting how big a difference that made, also that it still is markedly slower than assignment.

  • 5Lk9m2Q7NR

    The description of benefit of emplace_back() is partial and misleading. You should give it a better revision.

    • I rephrased the last two lines which was probably throwing people off on the recommendation. What other information would you like to see – perhaps an where the emplacement outperforms inserting – like in case of a string literal ? Anything else ?

  • You’re correct – it’s fixed now. Thanks !

  • Kurt Guntheroth

    Reserving space in advance makes push_back() approximately as fast as assign() for copying vectors, as mentioned in a comment.

    Releasing memory might improve performance when memory is critically scarce. This is almost never an issue in modern PCs or handsets. If the array is going to grow again, leaving it alone will always improve performance by eliminating allocations.

    vector::at() is slower because at() checks its argument to ensure it’s within the array and throws an exception if it isn’t. vector::operator[] doesn’t check, allowing you to get an access violation if you accidentally index off the end of the vector’s allocated memory. at() has its purposes, and so does operator[].

    emplace_back() is never slower than push_back(). But the only time it is faster is if the element class has a properly defined move-assignment operator. The move constructor and move assignment operator for std::vector’s element type must be noexcept. If they aren’t present or aren’t noexcept, the slower copy constructor is used. Even the compiler-generated move assignment may not be noexcept by default. That’s why you didn’t see a performance boost.

    Inserting in the front of a list or deque is a constant-time operation, whereas inserting in the front of a vector is takes time proportional to the number of items in the vector. On the other hand, the constant is very large compared to the constant for inserting in the back of a vector. std::list is an expensive data structure both in space and time. See boost::circular_buffer for an efficient, vector-like data structure with constant-time insertion on both ends.

    See chapter 10 of my book Optimized C++ for more performance-tuning tricks for standard library containers.


    • Thank you for sharing your thoughts Kurt. All good points – I also checked out your book recently on safari – it’s a very good reference on optimization. Also, I found this blog post of your’s very informative:http://oldhandsblog.blogspot.com/2016/09/c-optimization-bibliography.html

      Regarding “push back” with reserve being as fast as assigning, somehow my test results did not line up. Maybe there are additional optimizations that happens in assignment ? Here are my results:

      test results for insert() and push_back() with reservation below:

      Average of Assigning Vector :558.95
      Average of Using insert() :1225.14
      Average of Using push_back :1241.96

      • Kurt Guntheroth

        Thanks for the kind words. It’s awesome hearing from people who found the book helpful.

        The difference probably results from a compiler/standard library difference. My measurements were taken on a small i7 running Windows, compiled with Visual Studio 2013. If your results are from g++ or clang, they have different standard library implementations than Visual C++.

        You can never say too often that optimization results are processor- and compiler-dependent. Like “The Pirate Code*”, they’re more like guidelines, actually.

        *A cheap movie reference for non-Americans reading this.

  • Martin Moene

    Are you sure calling sw.Elapsed() inside the output streaming statement is not different from calling it directly after the to-be-measured operation? Assuming it is non-zero and significant w.r.t. the to-be-measured times, it changes the relative performance differences.

  • shaik shami

    Thanks for the information
    Latest Karnataka Govt Jobs 2017