Thanks Josh,
These changes did speed up the patches performance and when combined with my other Jit patches (needed for the rest of the localization, reactive processing, etc...) I have doubled total frame rate from 4 fps to 8-9 fps. CPU still sits at 50%. Interstingly, the variation of the jit.gl.render-grid.pat you sent, with none of the other patches running is 50 fps (movie rate) at 17% of both cores of the CPU as claimed by XP. Very fast. It just grinds to a halt with the rest of my patch loaded.
Discussions of these observations with Adrian Freed and John Lazzaro make us suspect a cache problem. The mem needed for the renedering combined with other processes ( there are many intermediate matrices before and after the gl.render ) exceeds the cache at some point and this may be the bottleneck keeping the CPU from exceeding 50%. So running 2 Max/Jitter instances using the same cache, according to this theory, would not help. More tests needed to verify.
So this again makes me want to run the equivalent process (exemplified in the jit.gl.render-grid.pat) on the GPU. Can someone point me to the nearest shader that can get us started?
Thanks to all,
Keith
Joshua Kit Clayton <jkc@musork.com> wrote:
So, offlist, I discovered that one of Keith's bottlenecks was using a
modified version of the jit.gl.render-grid.pat example patch. There's
a few things I'd like to share with the list regarding this patch.
First off it's an old patch (Jitter 1.0), and there's some better
ways to do it now. Secondly, there are some things in the patch which
can be sped up. I've included my comments to Keith and an example
patch using newer techniques (jit.gl.texture, jit.gl.mesh, jit.matrix
exprfill), together with some optimizations. OpenGL rendering in
Jitter is single threaded, so you're not going to get any benefit
from Multicore CPUs there...
1. the first thing in the jit.gl.render-grid.pat patch is that if I
run at a high resolution (like 256x256 vertices), I can get a
noticeable speedup by eliminating all the unnecessary jit.pwindows in
the patch which need to do float->char conversion, and/or downsampling.
2. I also got another 10-15% by disabling the interpolation for the
initial jit.matrix, and if you don't need to convert the green color
channel to alpha, you can save some CPU, by eliminating this stage
entirely.
3. A cheap RGB-> luminance conversion is to just grab the green color
channel (green is 70% of luminance data)
4. You can save a few more cycles by eliminating the jit.op @op * for
z displacement, and instead rely on the ob3d scale method
5. using @unique 1 to jit.qt.movie you'll use less CPU for redundant
frames (only an issue if your movie framerate is less than the patch
framerate).
6. using jit.gl.mesh, you may get even better performance.
-Joshua
#P toggle 390 366 15 0;
#P window setfont "Sans Serif" 9.;
#P window linecount 1;
#P message 390 389 88 196617 poly_mode \$1 \$1;
#P toggle 765 100 15 0;
#P message 765 128 44 196617 fsaa \$1;
#P toggle 706 100 15 0;
#P message 706 128 46 196617 sync \$1;
#P hidden newex 309 335 75 196617 loadmess 0.25;
#P window linecount 2;
#P comment 387 108 245 196617 use jit.gl.texture for rectangular
texture dimensions (not limited to power of two);
#P window linecount 3;
#P comment 597 55 347 196617 use unique to prevent redundant frames
\, and hence redundant rendering (should also turn down qmetro \,
just leaving high to demonstrate performance);
#P toggle 535 27 15 0;
#P window linecount 1;
#P message 535 55 53 196617 unique \$1;
#P hidden newex 445 239 48 196617 loadbang;
#P window linecount 2;
#P message 577 273 144 196617 exprfill 0 "norm[0]" \, exprfill 1 "1.-
norm[1]" \, bang;
#P window linecount 1;
#P newex 577 305 190 196617 jit.matrix texcoords 2 float32 320 240;
#P window linecount 2;
#P message 398 273 112 196617 exprfill 0 "snorm[0]" \, exprfill 1
"snorm[1]" \,;
#P window linecount 1;
#P newex 398 305 168 196617 jit.matrix geom 3 float32 320 240;
#P flonum 310 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P flonum 267 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P flonum 225 361 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P newex 196 385 99 196617 pak scale 1. 1. 0.25;
#P window linecount 2;
#P comment 27 419 140 196617 use jit.gl.mesh and scale Z with the
scale attribute;
#P window linecount 1;
#P newex 370 139 56 196617 t b l erase;
#P newex 393 209 192 196617 jit.gl.texture render_grid @name mytex;
#P newex 171 438 379 196617 jit.gl.mesh render_grid @draw_mode
tri_grid @texture mytex @color 1. 1. 1. 1.;
#P flonum 213 29 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P user jit.fpsgui 57 113 60 196617 0;
#P hidden message 415 51 103 196617 read multimeter.mov;
#P newex 172 276 50 196617 t b;
#P hidden message 399 29 14 196617 1;
#N vpatcher 642 520 1118 872;
#P inlet 230 84 15 0;
#P toggle 352 214 15 0;
#P window setfont "Sans Serif" 9.;
#P message 352 233 75 196617 auto_rotate \$1;
#P message 315 233 32 196617 reset;
#P newex 103 101 27 196617 t i i;
#P flonum 261 213 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P message 261 233 51 196617 radius \$1;
#P flonum 194 213 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P message 194 233 60 196617 tracking \$1;
#P newex 10 254 355 196617 jit.gl.handle render_grid
@inherit_transform 1 @depth_enable 1 @tracking 8;
#P outlet 10 284 15 0;
#P newex 103 56 50 196617 select 27;
#P newex 103 34 40 196617 key;
#P newex 120 146 91 196617 prepend fullscreen;
#P newex 120 167 189 196617 jit.window render_grid @rect 10 50 200
200 @depthbuffer 1;
#P comment 10 218 178 196617 inherit_transform is important here \,
since we are controlling jit.gl.render;
#P toggle 103 81 15 0;
#P fasten 14 0 7 0 357 251 15 251;
#P fasten 10 0 7 0 266 251 15 251;
#P fasten 8 0 7 0 199 248 15 248;
#P fasten 13 0 7 0 320 251 15 251;
#P connect 7 0 6 0;
#P connect 4 0 5 0;
#P connect 5 0 0 0;
#P connect 0 0 12 0;
#P connect 12 1 3 0;
#P connect 16 0 2 0;
#P connect 3 0 2 0;
#P connect 9 0 8 0;
#P connect 11 0 10 0;
#P connect 15 0 14 0;
#P pop;
#P newobj 706 155 115 196617 p window-mouse-rotate;
#P message 361 49 42 196617 rate \$1;
#P flonum 361 29 35 9 0 0 0 3 0 0 0 221 221 221 222 222 222 0 0 0;
#P newex 172 305 211 196617 jit.pack 3 float32 320 240 @out_name geom;
#P newex 172 210 203 196617 jit.matrix 1 float32 320 240 @planemap 2;
#P message 235 49 28 196617 read;
#P message 305 49 27 196617 stop;
#P message 271 49 31 196617 start;
#P toggle 172 28 15 0;
#P newex 172 48 51 196617 qmetro 2;
#P newex 172 79 153 196617 jit.qt.movie 320 240 @unique 1;
#P newex 623 193 120 196617 jit.gl.render render_grid;
#P comment 34 213 100 196617 grab green for luma;
#P comment 508 242 349 196617 using new exprfill method to generate
geometry and texture coordinates;
#P window linecount 5;
#P comment 828 111 100 196617 disable VBL sync for max framerate and
benchmarking. turn on FSAA for full scene anti aliasing;
#P connect 5 0 18 0;
#P fasten 42 0 20 0 395 419 176 419;
#P fasten 24 0 20 0 201 413 176 413;
#P connect 11 0 20 0;
#P connect 6 0 5 0;
#P fasten 5 0 4 0 177 75 177 75;
#P fasten 33 0 4 0 540 74 177 74;
#P fasten 9 0 4 0 240 75 177 75;
#P fasten 8 0 4 0 310 75 177 75;
#P fasten 7 0 4 0 276 75 177 75;
#P fasten 13 0 4 0 366 75 177 75;
#P hidden fasten 17 0 4 0 420 72 177 72;
#P connect 22 1 10 0;
#P connect 10 0 16 0;
#P connect 16 0 11 0;
#P connect 19 0 5 1;
#P fasten 30 0 20 1 582 426 222 426;
#P connect 25 0 24 1;
#P connect 26 0 24 2;
#P connect 27 0 24 3;
#P hidden connect 37 0 27 0;
#P hidden connect 15 0 12 0;
#P connect 12 0 13 0;
#P fasten 4 0 22 0 177 102 375 102;
#P connect 10 0 11 2;
#P connect 43 0 42 0;
#P connect 22 1 21 0;
#P hidden connect 32 0 29 0;
#P connect 29 0 28 0;
#P connect 34 0 33 0;
#P hidden connect 32 0 31 0;
#P connect 31 0 30 0;
#P fasten 22 0 3 0 375 174 628 174;
#P fasten 22 2 3 0 421 168 628 168;
#P fasten 14 0 3 0 711 177 628 177;
#P connect 39 0 38 0;
#P connect 40 0 14 0;
#P connect 38 0 14 0;
#P connect 41 0 40 0;
#P window clipboard copycount 44;
Keith McMillen
BEAM Foundation
http://www.beamfoundation.org/
510.502.5310