Darwinia+ deep and dirty Part 2

The only place you'll ever hear the truth
User avatar
Byron
level2
level2
Posts: 147
Joined: Tue May 13, 2008 3:48 pm

Darwinia+ deep and dirty Part 2

Postby Byron » Thu May 21, 2009 12:03 am

It has been a few weeks since the last entry on optimising Darwinia+ on the Xbox 360. Most of that time has been spent hunting down any possible areas to speed up the code base and basically shaving as many cycles as I can from the frame time in order to get above the threshold of 30 frames per second. It’s been a tough time but the speed-freak inside me loved every minute of it. There are many ways that Darwinia+ can be optimised but one of the most obvious is to make use of the three cores that the Xbox 360 has, currently the game is just using one of them. Rather than go gung-ho and turn everything you can find into a thread on a different core I sat down to look at the areas of the game for something that could be easily broken onto a separate thread without major impact (and therefore less risk) and would reap the most benefits.

It may not seem like it but Darwinia+ has hundreds of sound effect files. There are many variations of each type of effect so that when you are listening to the game playing you get a broad range of sound rather than the same old sound playing over and over again. As a consequence of this we can’t store all of the sounds in memory at the same time so we employ a caching system that reads in the samples, decodes them from OGG to native format ready to be played. As it turned out the actual act of decoding from OGG was appearing quite high in each of the PIX trace runs. This became my top target for breaking out into a thread because it satisfied the criteria of least risk and highest reward. Thankfully the sound system was quite well designed and it took less than two hours to get the sample decoder working in its own thread on another core and much to my surprise it worked first time.

Of course refinements were added over the next couple of days once numerous tests were done to make sure it didn’t kill thegame. After running PIX on the outcome it became clear that this one small act that took two hours to complete had a massive impact on the speed of the game and we were much closer to the 30fps target. As well as tackling potential speed bottlenecks using multi-core techniques a lot of analysis has been done on each of the games sub-systems with each area eliminated one by one. This is a very laborious task and it’s very easy to fall into a paranoid state where you are absolutely convinced that an engineer deliberately wrote the code just to make your life hell. Of course the reality is that the next generation of coders being pumped out of university don’t appear to ever have the need or inclination to look at what the compiler produces or consider the actual impact on the CPU that a small innocent looking piece of C++ has. I was shocked to learn that at least one of our engineers had never done any assembly language – probably not needed in a Java dominated computer science course.

This brings us neatly to the education side of optimisation work. It’s fine to speed up the game but that is useless if all that is going to happen is your engineers will find evenmore imaginative ways to make it slow again – albeit not on purpose but it will happen. So, as a know-it-all speed freak part of the job spec is to try and educate the masses into how not to undo all of your hard work. This is made even more fun by the various shades of ‘glazed over’ that each of your engineers can muster when you start to explain the intricacies of cache lines and why they should re-order the member variables of a class into most used and least used.

You have to keep trying though because the alternative is yet another round of the same thing on the next game.
User avatar
elexis
level5
level5
Posts: 1466
Joined: Fri Aug 24, 2007 6:11 am
Location: Australia
Contact:

Postby elexis » Thu May 21, 2009 4:03 am

You know I've never even seen assembly language. hmm...

...





Be right back...
User avatar
GrindTooth
level0
Posts: 1
Joined: Thu May 21, 2009 6:36 am
Location: Inside a DMA chain

Postby GrindTooth » Thu May 21, 2009 6:42 am

Also bad: new engineer who's using __dcbt to eliminate L2 misses from PIX but doesn't pay attention to overall function performance.
User avatar
Nutter
level3
level3
Posts: 324
Joined: Thu May 25, 2006 1:30 pm
Location: Denmark
Contact:

Postby Nutter » Thu May 21, 2009 7:04 am

Byron - Champion of the FPS's :wink:
VeryLittleGravitas
level0
Posts: 1
Joined: Thu May 21, 2009 7:19 am

Re: Darwinia+ deep and dirty Part 2

Postby VeryLittleGravitas » Thu May 21, 2009 7:22 am

Byron wrote:Of course the reality is that the next generation of coders being pumped out of university don’t appear to ever have the need or inclination to look at what the compiler produces or consider the actual impact on the CPU that a small innocent looking piece of C++ has. I was shocked to learn that at least one of our engineers had never done any assembly language – probably not needed in a Java dominated computer science course.


'Joel On Software' wrote an interesting article about this here:
http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html
martin
level5
level5
Posts: 3210
Joined: Fri Nov 19, 2004 8:37 pm

Postby martin » Thu May 21, 2009 11:52 am

I'm just coming to the end of my first year of CS, and it's one of those java schools everyone seems to hate. Including me, I can see that java is a good language to introduce people to the basics, so teaching java for the first year is good - their problem is that beyond that they keep on teaching java.
I don't think their C++ module (in the 3rd year, I have to wait 3 years before they finally decide to give me an OPTIONAL course in C++) really goes into enough depth from what I've read about it.

Then again, I never had any difficulty understanding recursion, pointers or even assembly - so maybe I'm not the best person to judge ;)
GENERATION 22:The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social experiment.
User avatar
Phelanpt
level5
level5
Posts: 1837
Joined: Thu Aug 10, 2006 4:20 am
Location: Portugal

Postby Phelanpt » Thu May 21, 2009 2:19 pm

In my university, although most of the classes use java, we have a few hard C classes (the network programming class uses both), a computer architecture class that also has an Assembly project, and a few computer theory classes where we learn about Turing and finite state machines.
An information security class also taught us about the sandbox model of the JVM.

Despite not having electronics, I think they actually do a good job of teaching how things work at the lower level, if the students are willing to listen to it.
User avatar
Byron
level2
level2
Posts: 147
Joined: Tue May 13, 2008 3:48 pm

Postby Byron » Thu May 21, 2009 3:40 pm

Here's an example of what I mean. I just ran 'Garden' in Darwinia+ for 2 hours and left it running on its own. It crashed on a malloc call and being in release mode all is not what it seems in the debugger so we have to work out from the debugger view what the values being passed around are. So, here's the code that started the crash:

sToc->m_sync.PutData( sync, sequenceId );

How would you go about finding out what sync and sequenceId are? Bearing in mind that the debugger may not show you the correct values. The answer is to go to the asm view:

sToc->m_sync.PutData( sync, sequenceId );
82338488 mr r5,r25
8233848C addi r4,r1,112 ; 70h
82338490 addi r3,r3,48 ; 30h
82338494 bl DArray<unsigned char>::PutData (8233ae68h)
82338498 lbz r28,70h(r1)

and then use that to work it out.
User avatar
Marquise Fishy TGF McGraw
level1
level1
Posts: 18
Joined: Tue May 05, 2009 5:29 pm
Location: A Bowl placed atop a hill, eyes that never blink, always watching, always watching...
Contact:

Postby Marquise Fishy TGF McGraw » Thu May 21, 2009 8:21 pm

I was reading the article that VeryLittleGravitas linked, and to be honest I'm actually quite surprised. I did electronic engineering at university, and I was longing for more courses about pointers so I could have something both easy AND interesting to do... we did assembly and C (not even ++) and data structures and such, and I don't know any Java - my favourite bit of the whole assembly course (which was mostly on other more limited micros) was "push Address, rets" to zip your way around the program... (I wonder if that all means I'm much more employable in the CS world? hmm...)

I've been occasionally poking at a game project (in C++ with SDL libs) and really like the OOP and simply being able to make a new array, rather than worry about malloc etc. Is this the same sort of dumbing down? Is that bad in the scheme of things? It seems alot easier to do that than really ever have to worry about addressing into memory you don't mean to be - I even use quite a few vector types instead of linked lists because it's just less work, and there's less things to go wrong.

Just get the IV guys to write an assembly program for a microchip which lights up some LEDs. I'm sure they'll improve almost instantly!
User avatar
bert_the_turtle
level5
level5
Posts: 4795
Joined: Fri Oct 13, 2006 6:11 pm
Location: Cologne
Contact:

Postby bert_the_turtle » Thu May 21, 2009 8:59 pm

Marquise Fishy TGF McGraw wrote:Is that bad in the scheme of things? It seems alot easier to do that than really ever have to worry about addressing into memory you don't mean to be - I even use quite a few vector types instead of linked lists because it's just less work, and there's less things to go wrong.
No, that's not bad. C++ is just as efficient as C if you don't use the extra stuff (virtual functions, mostly, and those still are relatively cheap). However, you have to be aware of your data layout. OOP usually means you put data that belongs to one object into a coherent memory segment and that causes trouble in some scenarios. Loading lots of not actually needed data into the cache whenever you update your object positions only and the like.

And for vectors vs. linked list, those are simply different data structures. As long as you don't need to arbitrarily insert and remove data in the middle and keep the order of the other elements as they were and you don't need to find objects in your list, vectors are just fine.
User avatar
Byron
level2
level2
Posts: 147
Joined: Tue May 13, 2008 3:48 pm

Postby Byron » Thu May 21, 2009 9:03 pm

Marquise Fishy TGF McGraw wrote:I was reading the article that VeryLittleGravitas linked, and to be honest I'm actually quite surprised. I did electronic engineering at university, and I was longing for more courses about pointers so I could have something both easy AND interesting to do... we did assembly and C (not even ++) and data structures and such, and I don't know any Java - my favourite bit of the whole assembly course (which was mostly on other more limited micros) was "push Address, rets" to zip your way around the program... (I wonder if that all means I'm much more employable in the CS world? hmm...)

I've been occasionally poking at a game project (in C++ with SDL libs) and really like the OOP and simply being able to make a new array, rather than worry about malloc etc. Is this the same sort of dumbing down? Is that bad in the scheme of things? It seems alot easier to do that than really ever have to worry about addressing into memory you don't mean to be - I even use quite a few vector types instead of linked lists because it's just less work, and there's less things to go wrong.

Just get the IV guys to write an assembly program for a microchip which lights up some LEDs. I'm sure they'll improve almost instantly!


Ah, one thing to understand is that I am the only IV member apart from maybe Tom who is a low level tech-head. All the rest are comfortable at the high-level architecture which is probably a by-product of their Imperial education (for Chris and John certainly) so the overall grand scheme of the games are very well thought out. However the common trade-off with generic systems is speed. If you think about it, designing a fast system for a PC based architecture is like trying to hit a very fast moving target and what will work fast for one processor/motherboard setup may chug on another - which is exactly the case with what's happened when Darwinia was ported to the 360. Up till now this kind of thing has never been an issue for Introversion because they have only ever targeted PC based platforms.

Consoles take a very different mind-set. On the outside and from a compiler they may look the same but the reality is you have to be careful of what you do and how you do it. I have worked on all the consoles since the PS1 so I have the upper hand on experience at the low level side but maybe lack on the high-level architectural side. That's why there is a team working on it - we all bring different qualities to the development process. I have still to work out what Mark does though - answers on a postcard please ;)

As far as your own code goes - do what works for you. You can always learn the speed stuff if you need to - just look what Ashley did to speed up Defcon!
User avatar
Byron
level2
level2
Posts: 147
Joined: Tue May 13, 2008 3:48 pm

Postby Byron » Thu May 21, 2009 9:15 pm

bert_the_turtle wrote:
Marquise Fishy TGF McGraw wrote:Is that bad in the scheme of things? It seems alot easier to do that than really ever have to worry about addressing into memory you don't mean to be - I even use quite a few vector types instead of linked lists because it's just less work, and there's less things to go wrong.
No, that's not bad. C++ is just as efficient as C if you don't use the extra stuff (virtual functions, mostly, and those still are relatively cheap). However, you have to be aware of your data layout. OOP usually means you put data that belongs to one object into a coherent memory segment and that causes trouble in some scenarios. Loading lots of not actually needed data into the cache whenever you update your object positions only and the like.

And for vectors vs. linked list, those are simply different data structures. As long as you don't need to arbitrarily insert and remove data in the middle and keep the order of the other elements as they were and you don't need to find objects in your list, vectors are just fine.


In fact, linked lists are cache killers and so are virtual functions. In general both are okay to use and in some circumstances are the best choice but if you have a piece of code that is speed critical then avoid both like the plague. Vectors are good because they maintain locality of reference i.e. are mapped contiguously in memory so if you load one element then you are loading many more (to the width of a cache line). Linked lists are different. It's entirely possible that each element in the linked list are nowhere near any of the other elements in the list so if an element is 4 bytes in size (as an extreme example) and the cache line size is 128 bytes then for each element accessed the cache has to load 128 bytes as it cannot simply load the 4 bytes for the element. It gets worse if your 4 bytes straddle 2 cache lines because that would result in 256 bytes being loaded into the cache just to access 4 bytes. Now imagine a linked list with 1000 elements in it. It gets even worse when you consider that a linked list needs at least another bit of data to hold the pointer to the next element and that data may not even reside in the same location as the element itself so now we have a minimum of 2 cache fetches - 1 to get the pointer to the next element and 1 to get the element itself.

Virtual functions by their nature require extra data in the class to hold the virtual table - if you include RTTI then it's more. So an innocent looking function call can blow the cache line - we had a case of this in the D+ code which I mentioned before. John hates the change I had to make to minimize the cache hit but it was necessary - again the trade off between an software engineering approach and a pragmatic approach.
martin
level5
level5
Posts: 3210
Joined: Fri Nov 19, 2004 8:37 pm

Postby martin » Thu May 21, 2009 9:48 pm

Going back to what TGF said, it's not necessarily *bad* that all the new programmers only know java (and java like languages), in my opinion programming is going to move onto using managed languages a lot more. However if it doesn't I'm ok, because I taught myself C(++) a while back, wheras most of my colleagues would be doomed.
Saying that, I love managed languages, I develop games with C#/XNA and sell managed languages and their wonderfulness wherever I can ;)
GENERATION 22:The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social experiment.
User avatar
ynbniar
level5
level5
Posts: 2028
Joined: Wed Nov 08, 2006 10:36 pm
Location: Home again...

Postby ynbniar » Fri May 22, 2009 12:32 am

pff assembly...didn't that go out of fashion with the blitter chip :?: ...glbasic is where it's at... (for those of us who don't have time/can't be bothered learning C or Python or <insert fashionable programming language here>) :wink:
User avatar
Marquise Fishy TGF McGraw
level1
level1
Posts: 18
Joined: Tue May 05, 2009 5:29 pm
Location: A Bowl placed atop a hill, eyes that never blink, always watching, always watching...
Contact:

Postby Marquise Fishy TGF McGraw » Fri May 22, 2009 10:13 am

(aside: every time I say function, I probably mean method, but I don't care because they're goddamn functions!)

OK well totally hypothetical problem then...

All my game objects are designed so that their specific object type can share an indiviudual behavoir function with other classes (and if necessery, dynamically change their behavoir between example behavoirs, eg an object that floats in the air, an object that bounces around, one that skeets across the floor etc). These behavoirs are designed so that the base class they all inherit from has a virtual function, Process(), and they each have a Process() with their specific behavoir code within (and they all operate on the variables of their parents), so I don't need to actually care which exact class a behavoir is, I can just refer to it by the base class they're all created from, and perform whateverobject->genericallytypedbehavoirlink->Process() and it'll do the right thing without having to worry about finding out exactly which type it is right now, and type casting it all first, and manage any changes from a basic function which will handle changing the pointers around etc.

I assume that, since this is going to effectively get called for every object, every physics frame (seriously IV, seperate process/render behavior! process things in a good process order, render things in a good render order - frankly this is why NeoThermic was able to speed DEFCON up so much...) I'm probably going to have some concerns about this...

The questions are then, is this an example of something that is going to cause problems in optimisation/runspeed, and, what can I do instead that isn't a nightmare of typing when calling the function? The thing is that really, I'm not sure what I can actually *do*. I can't seem to work out how to just tell it to look at this address and do some function at a memory position relative to that address as would be the obvious solution in ASM - the compiler gets in a hissy fit, because I don't know how to actually do anything in C; I can't exactly google it since I only have the idea in my head, not the accepted name for it, and "alternatives to virtual functions" doesn't turn up much that really is very useful to me; it's mostly all high level computer crazy stuff, where as I just want a brick with which to club my compiler around the noggin with.

Return to “Introversion Blog”

Who is online

Users browsing this forum: No registered users and 5 guests