Hi... I come across as confrontational, maybe, but I'm just being incredibly clear about what I am trying to say. So, please don't get offended.
I have simply profiled your engine using OpenGL profiler, and I am a bit horrified. At first glance, it is a pretty obvious why everything is going so slow. You are using immediate mode to render every object in the game... i.e. glBegin...glEnd blocks of code. This is incredibly slow, this is obvious problem.
You need to optimise the rendering path of your code... using vertex array. glBegin...glEnd style code is about 1000 times slower than appropriate vertex array. If you don't believe me benchmark rendering 100,000 triangles with immediate mode and then using retained mode.
The kind of configuration the above user has should be enough to render a few hundred thousand fully textured and blended triangles, so actually your explanation is not really acceptable..
Even just using display list would be advisable as a bandage for the this gushing wound...
Don't think of this as a problem, think of it as a task to increase the size of your customer base.
Immediate mode vs Retained mode:
http://www.cs.utk.edu/~huangj/CS594S06/ ... icsArc.ppt
Vertex Arrays:
http://www.opengl.org/documentation/spe ... ode21.html
Every frame you are making approximately 15,000 OpenGL calls.
Sample of the trace:
0.05 µs glColor4ubv({0, 40, 0, 255});
0.27 µs glBegin(GL_QUADS);
0.05 µs glTexCoord2f(0, 0);
0.05 µs glVertex2f(442, 533);
0.05 µs glTexCoord2f(0.8125, 0);
0.05 µs glVertex2f(494, 533);
0.05 µs glTexCoord2f(0.8125, 0.8125);
0.05 µs glVertex2f(494, 559);
0.05 µs glTexCoord2f(0, 0.8125);
0.05 µs glVertex2f(442, 559);
0.33 µs glEnd();
0.05 µs glColor4ubv({0, 40, 0, 255});
0.22 µs glBegin(GL_QUADS);
0.05 µs glTexCoord2f(0, 0);
0.05 µs glVertex2f(494, 533);
0.05 µs glTexCoord2f(0.8125, 0);
0.05 µs glVertex2f(546, 533);
0.00 µs glTexCoord2f(0.8125, 0.8125);
0.05 µs glVertex2f(546, 559);
0.00 µs glTexCoord2f(0, 0.8125);
0.05 µs glVertex2f(494, 559);
0.22 µs glEnd();
0.05 µs glColor4ubv({0, 40, 0, 255});
2.17 µs glBindTexture(GL_TEXTURE_2D, 83);
54.31 µs glBegin(GL_QUADS);
0.22 µs glTexCoord2f(0, 0);
0.22 µs glVertex2f(546, 533);
0.00 µs glTexCoord2f(0.8125, 0);
0.05 µs glVertex2f(598, 533);
0.00 µs glTexCoord2f(0.8125, 0.8125);
0.05 µs glVertex2f(598, 559);
0.05 µs glTexCoord2f(0, 0.8125);
0.05 µs glVertex2f(546, 559);
0.76 µs glEnd();
0.16 µs glColor4ubv({0, 40, 0, 255});
0.27 µs glBegin(GL_QUADS);
0.05 µs glTexCoord2f(0, 0);
0.05 µs glVertex2f(598, 533);
0.05 µs glTexCoord2f(0.8125, 0);
0.05 µs glVertex2f(650, 533);
0.05 µs glTexCoord2f(0.8125, 0.8125);
0.05 µs glVertex2f(650, 559);
0.05 µs glTexCoord2f(0, 0.8125);
0.05 µs glVertex2f(598, 559);
0.22 µs glEnd();
You could probably replace this with about 20-30 calls using deferred rendering APIs... it would be soooo much faster. Even changing to 16-bit color shouldn't be needed - actually I was surprised to see this.. it won't make much different on modern hardware as it is all geared for 32-bit data anyway.
I am a competent OpenGL programmer if you need further advise.
Regards,
Samuel