As long as you don't have too many objects in your world, a virtual update function is just fine. Where it stops being fine is if you use it for particle systems on a per-particle basis; there, you absolutely need large batches of particles you handle in one loop where the same code is applied to hundreds of particles at a time ('course, to be really fancy, you do this stuff on the GPU nowadays, but I digress). The same technique can be used to avoid virtual function calls in your object update code if you feel it's needed, of course: have a framework where objects are stored sorted by class and handled by a manager object; the manager object has the virtual function that then calls the actual update code of the objects, and of all objects of the managed class in one go.
There's something else that hurts with virtual functions, by the way, and I think it's worse than the cache misses (after all, there's one virtual function table per class which is likely to be in the cache already and the pointer to it is data in the class, so if you want to do anything with your object, you'll load it into the cache anyway): pipeline stalls. Calling a virtual function is an indirect call where the CPU doesn't know which code is supposed to be executed until it has looked stuff up, and that means it has to wait for the lookup to actually finish. Sure, there's branch prediction, but as I read, it doesn't work quite so well with indirect calls yet.
What Byron says about linked lists is true under one of two conditions:
1. it's a noninvasive linked list, where you have extra objects with the link pointers that hold a pointer to the objects, too. Don't do that if you don't want memory fragmentation.
2. you're iterating over your list for many small tasks. If you only iterate the list once per frame and do all the work on your objects in one go AND the link pointers are part of the object, well, you're loading the whole object into the cache anyway and the list pointers with it, so no harm done (unless the object is spread over cache lines, which shouldn't matter too much if you mind your data layout).
Darwinia+ deep and dirty Part 2
- bert_the_turtle
- level5

- Posts: 4795
- Joined: Fri Oct 13, 2006 6:11 pm
- Location: Cologne
- Contact:
I wish my uni taught proper programming
On a side note, some graphics operations cause epic pipeline stalls in XNA, does the same apply to DirectX/OpenGL? Things like getting texture data from GPU memory back to CPU memory (ie, data going to "wrong" way, since the graphics pipeline is designed for data to go from CPU->GPU->Screen)
On a side note, some graphics operations cause epic pipeline stalls in XNA, does the same apply to DirectX/OpenGL? Things like getting texture data from GPU memory back to CPU memory (ie, data going to "wrong" way, since the graphics pipeline is designed for data to go from CPU->GPU->Screen)
GENERATION 22:The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social experiment.
- NeoThermic
- Introversion Staff

- Posts: 6256
- Joined: Sat Mar 02, 2002 10:55 am
- Location: ::1
- Contact:
Marquise Fishy TGF McGraw wrote:seriously IV, seperate process/render behavior! process things in a good process order, render things in a good render order - frankly this is why NeoThermic was able to speed DEFCON up so much...
Well, to be honest my improvements were theoretical. I never got the time to apply them to Defcon properly. However, if you'd like to see me apply them, then ask Mark to let me by posting in support
NeoThermic
- Ace Rimmer
- level5

- Posts: 10803
- Joined: Thu Dec 07, 2006 9:46 pm
- Location: The Multiverse
- bert_the_turtle
- level5

- Posts: 4795
- Joined: Fri Oct 13, 2006 6:11 pm
- Location: Cologne
- Contact:
martin wrote:I wish my uni taught proper programming
On a side note, some graphics operations cause epic pipeline stalls in XNA, does the same apply to DirectX/OpenGL? Things like getting texture data from GPU memory back to CPU memory (ie, data going to "wrong" way, since the graphics pipeline is designed for data to go from CPU->GPU->Screen)
Yes, that will cause a stall. It's been a while since I worked with a DX and GL driver team but it used to be the case that the driver would triple buffer so if you request data back from the GPU it has to flush the pipeline in order to actually create that data. As a matter of interest - why were you reading texture data back?
bert_the_turtle wrote:As long as you don't have too many objects in your world, a virtual update function is just fine. Where it stops being fine is if you use it for particle systems on a per-particle basis; there, you absolutely need large batches of particles you handle in one loop where the same code is applied to hundreds of particles at a time ('course, to be really fancy, you do this stuff on the GPU nowadays, but I digress). The same technique can be used to avoid virtual function calls in your object update code if you feel it's needed, of course: have a framework where objects are stored sorted by class and handled by a manager object; the manager object has the virtual function that then calls the actual update code of the objects, and of all objects of the managed class in one go.
There's something else that hurts with virtual functions, by the way, and I think it's worse than the cache misses (after all, there's one virtual function table per class which is likely to be in the cache already and the pointer to it is data in the class, so if you want to do anything with your object, you'll load it into the cache anyway): pipeline stalls. Calling a virtual function is an indirect call where the CPU doesn't know which code is supposed to be executed until it has looked stuff up, and that means it has to wait for the lookup to actually finish. Sure, there's branch prediction, but as I read, it doesn't work quite so well with indirect calls yet.
What Byron says about linked lists is true under one of two conditions:
1. it's a noninvasive linked list, where you have extra objects with the link pointers that hold a pointer to the objects, too. Don't do that if you don't want memory fragmentation.
2. you're iterating over your list for many small tasks. If you only iterate the list once per frame and do all the work on your objects in one go AND the link pointers are part of the object, well, you're loading the whole object into the cache anyway and the list pointers with it, so no harm done (unless the object is spread over cache lines, which shouldn't matter too much if you mind your data layout).
Well said!
Byron wrote:martin wrote:I wish my uni taught proper programming
On a side note, some graphics operations cause epic pipeline stalls in XNA, does the same apply to DirectX/OpenGL? Things like getting texture data from GPU memory back to CPU memory (ie, data going to "wrong" way, since the graphics pipeline is designed for data to go from CPU->GPU->Screen)
Yes, that will cause a stall. It's been a while since I worked with a DX and GL driver team but it used to be the case that the driver would triple buffer so if you request data back from the GPU it has to flush the pipeline in order to actually create that data. As a matter of interest - why were you reading texture data back?
I can't remember, I think was just fiddling with data processing on the GPU when I noticed it - can't remember exactly what. Rest I wouldn't do it in a proper game
GENERATION 22:The first time you see this, copy it into your sig on any forum and add 1 to the generation. Social experiment.
- Marquise Fishy TGF McGraw
- level1

- Posts: 18
- Joined: Tue May 05, 2009 5:29 pm
- Location: A Bowl placed atop a hill, eyes that never blink, always watching, always watching...
- Contact:
Hmm, it should be little issue to do that then; the objects are stored in a vector in a vector - just putting them in an unsorted pile of same typed objects in an unsorted pile of object types; I was already trying to rearrange things to be a bit more process friendly, but I was just doing it the wrong way around.
Obviously, they're going to need sorting some day, since they probably need sorting into objects that it's worth processing at all, then processing in an order relative to their altitude (so a collision with immovable terrain can carry itself up the chain to an object at the top of a pile, rather than the objects trying to push each other apart all the way down and realizing they can't); it'd be nice to have an efficient processing mechanism for a few thousand pixel sized particles but I've no idea how to go about implimenting that without detonating my CPU... (cmon I at least want a ball pit ;-;)
Obviously, they're going to need sorting some day, since they probably need sorting into objects that it's worth processing at all, then processing in an order relative to their altitude (so a collision with immovable terrain can carry itself up the chain to an object at the top of a pile, rather than the objects trying to push each other apart all the way down and realizing they can't); it'd be nice to have an efficient processing mechanism for a few thousand pixel sized particles but I've no idea how to go about implimenting that without detonating my CPU... (cmon I at least want a ball pit ;-;)
Who is online
Users browsing this forum: No registered users and 4 guests




