Fahrenheit's C++ exercises: May 2012

This weekend I had a spare hour or so and made some measurements that deal with default allocation strategies on Visual C++ 10 and GCC 4.5 compiler.
Update: added results for Intel C++ compiler 12.1. They are very similar to GCC, but two things are interesting about them:

Seems like Intel's Large Object Heap starts with smaller objects
Sometimes 64-bit allocations are smaller then 32-bit ones. Not sure how it happens.

The results were enlightening to say as least. I always knew about memory overhead of small allocations but never though that it is that large.
Just look at the table below. To create them I allocated 100Mb in different chunks starting from 4 bytes (int on x86) up to 40000 bytes.
Overhead column presents average overhead in bytes per each allocation.

Visual C++, 32-bit compilation

Block, bytes	Total size allocated (all chunks + overhead), KB	Overhead, bytes per chunk
4	782788	28
40	156568	24
400	103708	25
4000	98444	32
40000	98052	162

Visual C++, 64-bit compilation

Block, bytes	Total size allocated (chunks + overhead), KB	Overhead, bytes per chunk
4	1565588	60
40	234824	56
400	109600	49
4000	99044	57
40000	98100	182

GCC, 32-bit compilation

Block, bytes	Total size allocated (chunks + overhead), KB	Overhead, bytes per chunk
4	390592	12
40	117952	8
400	99648	8
4000	97856	8
40000	98432	318

Intel C++, 32-bit compilation

Block, bytes	Total size allocated (chunks + overhead), KB	Overhead, bytes per chunk
4	392220	12
40	117652	8
400	100000	10
4000	102196	186
40000	98152	203

Intel C++, 64-bit compilation

Block, bytes	Total size allocated (chunks + overhead), KB	Overhead, bytes per chunk
4	392828	12
40	117620	8
400	102008	18
4000	102008	178
40000	98068	169

So here are few obvious and not so obvious thoughts. Of course they apply when dealing with large data structures that impact application's performance:

(Very obvious one) Dynamic allocation of small values or not so large structures is extreme memory waste. Maybe pack them into std::vector and pass around iterators instead?
All non-linear structures like std::list, std::map, std::set that are storing relatively small items are very memory-inefficient. In most cases they should be replaced by std::vector or something similar (like sorted vector that was mentioned by Scott Meyers)
Even if your algorithm seem to require std::list for small items (e.g. because items should be swapped around, or added dynamically) - start with benchmarking! Even when both std::vector and std::list are being populated in one place (so list is packed well and is not very fragmented), processor cache might benefit from more localized memory footprint of std::vector.
Always create your shared_ptr's with make_shared instead of new. As discussed in previous post it is not only a faster way but also reduces a number of small allocations.

Any other examples we should mention here? All comments are welcome!

PS: Here is the link to source I used to get this data.

Fahrenheit's C++ exercises

Monday, May 28, 2012

Memory allocation overhead in C++ - field study

Visual C++, 32-bit compilation

Visual C++, 64-bit compilation

GCC, 32-bit compilation

Intel C++, 32-bit compilation

Intel C++, 64-bit compilation