Darwinia+ deep and dirty Part I
Darwinia+ deep and dirty Part I
My mother-in-law used to do something that really annoyed me. Okay, admittedly being a mother-in-law there were lots of things she did that annoyed me, but this one not only annoyed but really frustrated me at the same time. Whenever you went to into the kitchen the most commonly used items would always be at the back of the cupboard. It didn’t matter if you rearranged them to be at the front they would always find their way to the back. So if you went to make a cup of instant coffee the granules would always be at the back, which meant un-packing the cupboard to get at said granules, and once finished re-packing again. It turned what should have been a simple task into a logistical exercise. As somebody used to optimising processes and code this was a very frustrating exercise for me and no matter what argument I put forward it would always be the same when we came back another day. Sometimes some things just don’t want to be optimised.
Darwinia+ is coming to the end of the project, and as usual Knuth’s law has come into effect and we have to improve the performance as much as possible. But like my mother-in-law there are parts of this game that just flatly refuse to be optimised. I always equate this part of the job as similar to working in the sewers of London – you are going to get deep and dirty dealing with other people’s mess.
The first task was to pick a selection of games in the Multiwinia part of Darwinia+ and produce baseline results for comparison. These baseline files are generated by the code and consist of the current frames per second output once a second into a file. In theory all we need to do is repeat the same test under the exactly the same conditions, and we can graph the changes the optimisations make. It’s not a finely grained picture of the game, but gives enough overview to see if anything is having an effect, and most importantly is something that can be automated to a certain extent. This is precisely the reason Multiwinia was chosen, as it is easier to rig the predictive physics system to play exactly the same game over and over again given the same player input, and to emulate that we just leave the controller alone and let the game play itself.
The second task is to pick which of the baselines was the worst performer, and in Darwinia+ that turned out to be the Multiwinia game Rocket Race on a 4-player map. This particular map is huge and the game can run for an undetermined period of time depending in how well the player or AI is doing – all the rest of the games use a time limit. For some reason this particular map was dipping to below 8 frames per second which is well below the target 30 frames per second.
In reality a more meaningful measure is the number of milliseconds a frame takes but that is something I will be measuring when I have to go deeper and start to look at isolated sections of the game. This is when I put on my overalls, hard-hat, spotlight and prepare to go deeper and even dirtier.
If I had my way early on I would have changed the entire approach to optimisation. For my entire career I have always had this following line from Donald Knuth quoted to me:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
Or more accurately and ironically a small part of that line:
“Premature optimization is the root of all evil."
Nearly every lead developer I have worked for has used that quote as an excuse to leave the optimization of the game to the last few months when everything is complete. In part I agree with that statement but there’s also a small part of my brain nagging away, screaming:
‘But he wrote that in 1974 when CPU’s and computers in general were far less complex than they are today – not to mention the amount of code they would have had to deal with!’
And you know what, I am probably going to get hung, drawn and quartered for saying this, but that nagging scream is right, the world of coding has changed a lot since 1974. We now have processors that have caches and are able to execute more than one instruction at a time with predictive branching. Running at blistering speeds that would have made Knuth’s head spin back in ’74. Our code base is now measured in the millions of lines that they simply would not have had enough storage space to keep, let alone RAM in which to fit it all for compilation. So why do we still cling to this archaic notion? Because having said all that, there is still some truth to it.
It’s here where I differ from most people’s interpretation of Knuth’s famous line. Rather than leave the optimization phase of the project to the last few months of the entire project do it early on a module-by-module basis. One of the great things about C++ is that it forces us to work in a very modular basis (at least it should!), so what we end up with is a game made up of lots of loosely coupled modules. So it should be possible to treat modules as separate entities to be optimized on a case-by-case basis with a final interoperability optimization later on in the project lifecycle.
I can’t do this for Darwinia+ because the game has been out for quite some time in the form of Darwinia and Multiwinia for the PC. The game is modular but given the quantity of code going through each module one by one would be very costly in both time and money. So all that is left is to run the game and identify hot spots and do deep analysis to work out exactly why it is killing the game – is it CPU or GPU bound? So back to the sewer worker analogy and getting down deep and dirty…
Unlike my Mother-in-Law this one WILL be optimized!
Darwinia+ is coming to the end of the project, and as usual Knuth’s law has come into effect and we have to improve the performance as much as possible. But like my mother-in-law there are parts of this game that just flatly refuse to be optimised. I always equate this part of the job as similar to working in the sewers of London – you are going to get deep and dirty dealing with other people’s mess.
The first task was to pick a selection of games in the Multiwinia part of Darwinia+ and produce baseline results for comparison. These baseline files are generated by the code and consist of the current frames per second output once a second into a file. In theory all we need to do is repeat the same test under the exactly the same conditions, and we can graph the changes the optimisations make. It’s not a finely grained picture of the game, but gives enough overview to see if anything is having an effect, and most importantly is something that can be automated to a certain extent. This is precisely the reason Multiwinia was chosen, as it is easier to rig the predictive physics system to play exactly the same game over and over again given the same player input, and to emulate that we just leave the controller alone and let the game play itself.
The second task is to pick which of the baselines was the worst performer, and in Darwinia+ that turned out to be the Multiwinia game Rocket Race on a 4-player map. This particular map is huge and the game can run for an undetermined period of time depending in how well the player or AI is doing – all the rest of the games use a time limit. For some reason this particular map was dipping to below 8 frames per second which is well below the target 30 frames per second.
In reality a more meaningful measure is the number of milliseconds a frame takes but that is something I will be measuring when I have to go deeper and start to look at isolated sections of the game. This is when I put on my overalls, hard-hat, spotlight and prepare to go deeper and even dirtier.
If I had my way early on I would have changed the entire approach to optimisation. For my entire career I have always had this following line from Donald Knuth quoted to me:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
Or more accurately and ironically a small part of that line:
“Premature optimization is the root of all evil."
Nearly every lead developer I have worked for has used that quote as an excuse to leave the optimization of the game to the last few months when everything is complete. In part I agree with that statement but there’s also a small part of my brain nagging away, screaming:
‘But he wrote that in 1974 when CPU’s and computers in general were far less complex than they are today – not to mention the amount of code they would have had to deal with!’
And you know what, I am probably going to get hung, drawn and quartered for saying this, but that nagging scream is right, the world of coding has changed a lot since 1974. We now have processors that have caches and are able to execute more than one instruction at a time with predictive branching. Running at blistering speeds that would have made Knuth’s head spin back in ’74. Our code base is now measured in the millions of lines that they simply would not have had enough storage space to keep, let alone RAM in which to fit it all for compilation. So why do we still cling to this archaic notion? Because having said all that, there is still some truth to it.
It’s here where I differ from most people’s interpretation of Knuth’s famous line. Rather than leave the optimization phase of the project to the last few months of the entire project do it early on a module-by-module basis. One of the great things about C++ is that it forces us to work in a very modular basis (at least it should!), so what we end up with is a game made up of lots of loosely coupled modules. So it should be possible to treat modules as separate entities to be optimized on a case-by-case basis with a final interoperability optimization later on in the project lifecycle.
I can’t do this for Darwinia+ because the game has been out for quite some time in the form of Darwinia and Multiwinia for the PC. The game is modular but given the quantity of code going through each module one by one would be very costly in both time and money. So all that is left is to run the game and identify hot spots and do deep analysis to work out exactly why it is killing the game – is it CPU or GPU bound? So back to the sewer worker analogy and getting down deep and dirty…
Unlike my Mother-in-Law this one WILL be optimized!
- shinygerbil
- level5

- Posts: 4667
- Joined: Wed Dec 22, 2004 10:14 pm
- Location: Out, finding my own food. Also, doing the shinyBonsai Manoeuvre(tm)
- Contact:
Right and Wrong
Knuth is both wrong and right to declaim premature optimization. He is really referring to the obsessive-compulsive tendency of programmers to turn this:
...into this:
Putting aside the fact that most modern compilers will do this anyway, optimizations of this sort often serve not only to obfuscate code, but also to really bog down the development process in trivialities.
On the flip side, we do some incredibly complex things with computers these days, involving algorithms that Knuth could never have conceived of at the time. When we design a system from a high level, efficiency must be part of the equation, otherwise we will often write ourselves into a corner by our own aversion to "premature" optimization. I'm not just talking about hashtables vs AVL vs splay trees, I'm looking at really deep and profound differences between one design path and another. Unfortunately, the most obvious design path is not always the best for long-term efficiency and maintainability, especially in C++.
Code: Select all
void do_something(int x)
{
something_else(x + 5) * something_else(x + 5);
}
...into this:
Code: Select all
void do_something(int x)
{
int mid = x + 5;
something_else(mid) * something_else(mid);
}
Putting aside the fact that most modern compilers will do this anyway, optimizations of this sort often serve not only to obfuscate code, but also to really bog down the development process in trivialities.
On the flip side, we do some incredibly complex things with computers these days, involving algorithms that Knuth could never have conceived of at the time. When we design a system from a high level, efficiency must be part of the equation, otherwise we will often write ourselves into a corner by our own aversion to "premature" optimization. I'm not just talking about hashtables vs AVL vs splay trees, I'm looking at really deep and profound differences between one design path and another. Unfortunately, the most obvious design path is not always the best for long-term efficiency and maintainability, especially in C++.
In the beginning, there was Uplink. Uplink rocked. Then there was Darwinia. Darwinia rocked. Then there was Defcon. Defcon... Hey, I think there's a trend here!
Re: Right and Wrong
djspiewak wrote:Unfortunately, the most obvious design path is not always the best for long-term efficiency and maintainability, especially in C++.
I've had a lot of reasons at work to agree with that statement lately. Seem to be doing a lot of design work for things written in C++, and we keep getting to implementation time and just sort of... splat. Coding ourselves into corners, even before somebody comes along and says 'oh, but what about this thing it can't do that is essential that I didn't tell you about before because you're supposed to know by telepathy'.
But it is very easy to get things wrong if you just plough ahead on functionality, and sometimes you end up having to re-do all your inter-module interactions. Maybe that's the most important thing to get right first - how the modules work inside is, as Byron says, something you can leave for later. Hurrah for encapsulation!
-
Weatherproof
- level1

- Posts: 22
- Joined: Thu Jul 17, 2008 5:18 pm
Premature optimisation might be bad, but so is pessimisation
The real problem with premature optimisation is that you just don't know where your performance problems are going to be until you have some real working code. If you've thought about it and implemented the code avoiding the problems you've thought of, then by definition, the remaining problems are the ones you haven't thought of.
That said, its also quite possible to do your implementation in a bone-headed way which makes later optimisation very hard. You might choose a technique which puts 1000 .1% performance hits around the code, making them almost invisible to profiling (lots of dynamic memory allocation and/or copying). Or you might just choose an inflexible structure which turns out to be completely wrong (premature use of over-clever structures is a good way to get here).
Knuth didn't mean "you don't need to think about performance". He meant "don't be an idiot".
That said, its also quite possible to do your implementation in a bone-headed way which makes later optimisation very hard. You might choose a technique which puts 1000 .1% performance hits around the code, making them almost invisible to profiling (lots of dynamic memory allocation and/or copying). Or you might just choose an inflexible structure which turns out to be completely wrong (premature use of over-clever structures is a good way to get here).
Knuth didn't mean "you don't need to think about performance". He meant "don't be an idiot".
Re: Premature optimisation might be bad, but so is pessimisa
jsgf wrote:The real problem with premature optimisation is that you just don't know where your performance problems are going to be until you have some real working code. If you've thought about it and implemented the code avoiding the problems you've thought of, then by definition, the remaining problems are the ones you haven't thought of.
That said, its also quite possible to do your implementation in a bone-headed way which makes later optimisation very hard. You might choose a technique which puts 1000 .1% performance hits around the code, making them almost invisible to profiling (lots of dynamic memory allocation and/or copying). Or you might just choose an inflexible structure which turns out to be completely wrong (premature use of over-clever structures is a good way to get here).
Knuth didn't mean "you don't need to think about performance". He meant "don't be an idiot".
Very true. The problem with Knuth is not what he said/meant it's other peoples interpretation of it. A lot of people live by the rule that you should not attempt to optimise your code until the very last moment which is madness on a system that runs into the millions of lines of code. It's very easy to measure the impact a module has on the I and D cache for example and there is no reason why on a module-by-module basis you can't fix those issues.
Your example of the over-clever structures is a very good one. The biggest issue we have with modern C++ usage is what happens when virtual functions or even class layout are miss-used and how much of an impact it can have on performance. As an example, and I will be covering this in greater detail in part 2, we had a tight loop looking through all the buildings searching for one with a specific ID. The pointers to the buildings were held in an array and the loop dereferenced the pointer to check the ID contained within the instance of building class. This might seem innocent but each dereference of the pointer caused another D-Cache line to be pulled in from main memory just to look for 1 int. The solution was to store this ID in the array with the pointer to the building since the array itself was already in the D-Cache. That simple change saved over 5 million wasted cycles on a processor that can execute 2 instructions per cycle. Had the programmer been aware of what the compiler produces from such code and the impact it would have on the cache systems then things like this can be avoided. The problem is that the programmers weren't aware and the code base is full of such code. This is where early optimisation would have helped us.
I am currently reading the book "C++ Coding Standards" by Herb Sutter and Andrei Alexandrescu, and they talk about this exact problem. Their 8th guideline is "Don’t optimize prematurely", but their 9th guideline is "Don’t pessimize prematurely"...
Knuth's rule is no excuse to discard efficient design, if this design isn't more difficult to understand than the non-optimized one, and doesn't obfuscate the code. An obvious example is the use of prefix increment operator instead of the postfix one. Avoiding algorithms with quadratic complexity is also something that should be done early.
Knuth's rule is no excuse to discard efficient design, if this design isn't more difficult to understand than the non-optimized one, and doesn't obfuscate the code. An obvious example is the use of prefix increment operator instead of the postfix one. Avoiding algorithms with quadratic complexity is also something that should be done early.
Re: Premature optimisation might be bad, but so is pessimisa
Byron wrote:Your example of the over-clever structures is a very good one. The biggest issue we have with modern C++ usage is what happens when virtual functions or even class layout are miss-used and how much of an impact it can have on performance. As an example, and I will be covering this in greater detail in part 2, we had a tight loop looking through all the buildings searching for one with a specific ID. The pointers to the buildings were held in an array and the loop dereferenced the pointer to check the ID contained within the instance of building class. This might seem innocent but each dereference of the pointer caused another D-Cache line to be pulled in from main memory just to look for 1 int. The solution was to store this ID in the array with the pointer to the building since the array itself was already in the D-Cache. That simple change saved over 5 million wasted cycles on a processor that can execute 2 instructions per cycle. Had the programmer been aware of what the compiler produces from such code and the impact it would have on the cache systems then things like this can be avoided. The problem is that the programmers weren't aware and the code base is full of such code. This is where early optimisation would have helped us.
If I understand you correctly, you're doing optimizations for D+; are any of these changes going to make it into Darwinia or Multiwinia?
Shwart!!
Re: Premature optimisation might be bad, but so is pessimisa
Shwart!! wrote:Byron wrote:Your example of the over-clever structures is a very good one. The biggest issue we have with modern C++ usage is what happens when virtual functions or even class layout are miss-used and how much of an impact it can have on performance. As an example, and I will be covering this in greater detail in part 2, we had a tight loop looking through all the buildings searching for one with a specific ID. The pointers to the buildings were held in an array and the loop dereferenced the pointer to check the ID contained within the instance of building class. This might seem innocent but each dereference of the pointer caused another D-Cache line to be pulled in from main memory just to look for 1 int. The solution was to store this ID in the array with the pointer to the building since the array itself was already in the D-Cache. That simple change saved over 5 million wasted cycles on a processor that can execute 2 instructions per cycle. Had the programmer been aware of what the compiler produces from such code and the impact it would have on the cache systems then things like this can be avoided. The problem is that the programmers weren't aware and the code base is full of such code. This is where early optimisation would have helped us.
If I understand you correctly, you're doing optimizations for D+; are any of these changes going to make it into Darwinia or Multiwinia?
Shwart!!
The optimisation I am doing are very XBox360 specific and I doubt you would notice a difference in the PC version.
There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
People always end the quote prematurely.
- The GoldFish
- level5

- Posts: 3961
- Joined: Fri Mar 01, 2002 9:01 pm
- Location: Bowl / South UK
- Contact:
My strategy has always been to target my data structures and make sure they're as functional and the data is accessable in the ways I'm going to want to access it, *before* I write a ton of code using them in different ways. Not being funny or anything, but putting the building ID in the with the pointer, and building a binary tree out of the all the building IDs/Pointers in the level, seems like what would have been there to begin with... or even accepting that the buildings could be limited to 1000 and having a fixed size array sucking up memory. But, really those are my practical concerns rather than a realisation that there could be a performance issue involved.
I'm not trying to say bad things or anything, but it has always seemed to me that IV's games are often poorly optimised/poorly implimented in some key areas. Darwinia's performance has never made sense, and from the impression I've been given, even some basic data reoganisation and rendering changes could have cut DEFCON frame times by something like a factor of 2. It seems to me like optimisation has never been a priority, and when it has been a priority, it's never been at the expence of having to rework the game in any major way...
Glad to hear D+ is coming to the end of the development cycle, though!
I'm not trying to say bad things or anything, but it has always seemed to me that IV's games are often poorly optimised/poorly implimented in some key areas. Darwinia's performance has never made sense, and from the impression I've been given, even some basic data reoganisation and rendering changes could have cut DEFCON frame times by something like a factor of 2. It seems to me like optimisation has never been a priority, and when it has been a priority, it's never been at the expence of having to rework the game in any major way...
Glad to hear D+ is coming to the end of the development cycle, though!
Who is online
Users browsing this forum: No registered users and 4 guests


