Darwinia+ deep and dirty Part 2

The only place you'll ever hear the truth
User avatar
bert_the_turtle
level5
level5
Posts: 4795
Joined: Fri Oct 13, 2006 6:11 pm
Location: Cologne
Contact:

Postby bert_the_turtle » Fri May 22, 2009 11:11 am

As long as you don't have too many objects in your world, a virtual update function is just fine. Where it stops being fine is if you use it for particle systems on a per-particle basis; there, you absolutely need large batches of particles you handle in one loop where the same code is applied to hundreds of particles at a time ('course, to be really fancy, you do this stuff on the GPU nowadays, but I digress). The same technique can be used to avoid virtual function calls in your object update code if you feel it's needed, of course: have a framework where objects are stored sorted by class and handled by a manager object; the manager object has the virtual function that then calls the actual update code of the objects, and of all objects of the managed class in one go.

There's something else that hurts with virtual functions, by the way, and I think it's worse than the cache misses (after all, there's one virtual function table per class which is likely to be in the cache already and the pointer to it is data in the class, so if you want to do anything with your object, you'll load it into the cache anyway): pipeline stalls. Calling a virtual function is an indirect call where the CPU doesn't know which code is supposed to be executed until it has looked stuff up, and that means it has to wait for the lookup to actually finish. Sure, there's branch prediction, but as I read, it doesn't work quite so well with indirect calls yet.

What Byron says about linked lists is true under one of two conditions:
1. it's a noninvasive linked list, where you have extra objects with the link pointers that hold a pointer to the objects, too. Don't do that if you don't want memory fragmentation.
2. you're iterating over your list for many small tasks. If you only iterate the list once per frame and do all the work on your objects in one go AND the link pointers are part of the object, well, you're loading the whole object into the cache anyway and the list pointers with it, so no harm done (unless the object is spread over cache lines, which shouldn't matter too much if you mind your data layout).
martin
level5
level5
Posts: 3210
Joined: Fri Nov 19, 2004 8:37 pm

Postby martin » Fri May 22, 2009 11:57 am

I wish my uni taught proper programming :(

On a side note, some graphics operations cause epic pipeline stalls in XNA, does the same apply to DirectX/OpenGL? Things like getting texture data from GPU memory back to CPU memory (ie, data going to "wrong" way, since the graphics pipeline is designed for data to go from CPU->GPU->Screen)
GENERATION 22:The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social experiment.
User avatar
NeoThermic
Introversion Staff
Introversion Staff
Posts: 6256
Joined: Sat Mar 02, 2002 10:55 am
Location: ::1
Contact:

Postby NeoThermic » Fri May 22, 2009 2:28 pm

Marquise Fishy TGF McGraw wrote:seriously IV, seperate process/render behavior! process things in a good process order, render things in a good render order - frankly this is why NeoThermic was able to speed DEFCON up so much...


Well, to be honest my improvements were theoretical. I never got the time to apply them to Defcon properly. However, if you'd like to see me apply them, then ask Mark to let me by posting in support ;)

NeoThermic
User avatar
Ace Rimmer
level5
level5
Posts: 10803
Joined: Thu Dec 07, 2006 9:46 pm
Location: The Multiverse

Postby Ace Rimmer » Fri May 22, 2009 2:38 pm

/me posts in support
Smoke me a kipper, I'll be back for breakfast...
User avatar
bert_the_turtle
level5
level5
Posts: 4795
Joined: Fri Oct 13, 2006 6:11 pm
Location: Cologne
Contact:

Postby bert_the_turtle » Fri May 22, 2009 2:40 pm

I'm more interested in the unrelated c2 packets, but would support performance enhancements too.
User avatar
Byron
level2
level2
Posts: 147
Joined: Tue May 13, 2008 3:48 pm

Postby Byron » Fri May 22, 2009 3:10 pm

martin wrote:I wish my uni taught proper programming :(

On a side note, some graphics operations cause epic pipeline stalls in XNA, does the same apply to DirectX/OpenGL? Things like getting texture data from GPU memory back to CPU memory (ie, data going to "wrong" way, since the graphics pipeline is designed for data to go from CPU->GPU->Screen)


Yes, that will cause a stall. It's been a while since I worked with a DX and GL driver team but it used to be the case that the driver would triple buffer so if you request data back from the GPU it has to flush the pipeline in order to actually create that data. As a matter of interest - why were you reading texture data back?
User avatar
Byron
level2
level2
Posts: 147
Joined: Tue May 13, 2008 3:48 pm

Postby Byron » Fri May 22, 2009 3:11 pm

bert_the_turtle wrote:As long as you don't have too many objects in your world, a virtual update function is just fine. Where it stops being fine is if you use it for particle systems on a per-particle basis; there, you absolutely need large batches of particles you handle in one loop where the same code is applied to hundreds of particles at a time ('course, to be really fancy, you do this stuff on the GPU nowadays, but I digress). The same technique can be used to avoid virtual function calls in your object update code if you feel it's needed, of course: have a framework where objects are stored sorted by class and handled by a manager object; the manager object has the virtual function that then calls the actual update code of the objects, and of all objects of the managed class in one go.

There's something else that hurts with virtual functions, by the way, and I think it's worse than the cache misses (after all, there's one virtual function table per class which is likely to be in the cache already and the pointer to it is data in the class, so if you want to do anything with your object, you'll load it into the cache anyway): pipeline stalls. Calling a virtual function is an indirect call where the CPU doesn't know which code is supposed to be executed until it has looked stuff up, and that means it has to wait for the lookup to actually finish. Sure, there's branch prediction, but as I read, it doesn't work quite so well with indirect calls yet.

What Byron says about linked lists is true under one of two conditions:
1. it's a noninvasive linked list, where you have extra objects with the link pointers that hold a pointer to the objects, too. Don't do that if you don't want memory fragmentation.
2. you're iterating over your list for many small tasks. If you only iterate the list once per frame and do all the work on your objects in one go AND the link pointers are part of the object, well, you're loading the whole object into the cache anyway and the list pointers with it, so no harm done (unless the object is spread over cache lines, which shouldn't matter too much if you mind your data layout).


Well said!
martin
level5
level5
Posts: 3210
Joined: Fri Nov 19, 2004 8:37 pm

Postby martin » Fri May 22, 2009 5:05 pm

Byron wrote:
martin wrote:I wish my uni taught proper programming :(

On a side note, some graphics operations cause epic pipeline stalls in XNA, does the same apply to DirectX/OpenGL? Things like getting texture data from GPU memory back to CPU memory (ie, data going to "wrong" way, since the graphics pipeline is designed for data to go from CPU->GPU->Screen)


Yes, that will cause a stall. It's been a while since I worked with a DX and GL driver team but it used to be the case that the driver would triple buffer so if you request data back from the GPU it has to flush the pipeline in order to actually create that data. As a matter of interest - why were you reading texture data back?


I can't remember, I think was just fiddling with data processing on the GPU when I noticed it - can't remember exactly what. Rest I wouldn't do it in a proper game ;)
GENERATION 22:The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social experiment.
User avatar
Marquise Fishy TGF McGraw
level1
level1
Posts: 18
Joined: Tue May 05, 2009 5:29 pm
Location: A Bowl placed atop a hill, eyes that never blink, always watching, always watching...
Contact:

Postby Marquise Fishy TGF McGraw » Fri May 22, 2009 10:06 pm

Hmm, it should be little issue to do that then; the objects are stored in a vector in a vector - just putting them in an unsorted pile of same typed objects in an unsorted pile of object types; I was already trying to rearrange things to be a bit more process friendly, but I was just doing it the wrong way around.

Obviously, they're going to need sorting some day, since they probably need sorting into objects that it's worth processing at all, then processing in an order relative to their altitude (so a collision with immovable terrain can carry itself up the chain to an object at the top of a pile, rather than the objects trying to push each other apart all the way down and realizing they can't); it'd be nice to have an efficient processing mechanism for a few thousand pixel sized particles but I've no idea how to go about implimenting that without detonating my CPU... (cmon I at least want a ball pit ;-;)

Return to “Introversion Blog”

Who is online

Users browsing this forum: No registered users and 3 guests