Real time ray-tracing in hardware

There are two main strategies to render a computer generated image: the rasterizing, in which the polygons are projected to the screen and shaded according to some rules, and the ray-tracing, where rays are traced from the eye through the pixels to intersect the scene and compute the colour. The first one is the one that is commonly used for videogames as it allows, for a limited (though currently really high, by the order of millions) number of polygons, to update the image shown in the screen in real time (at least 30 frames per second), changing the camera position and/or moving a character. The ray-tracing is used to compute high quality and realistic images, that typically take from minutes to several hours to compute and are later used for advertisements, comercials or movies. Since the developement of high performance graphic cards especialised in rasterizing (videogames), and especially since the appearance of programmable shaders and GPU’s, lots of attempts have been made to accelerate the slower of the two techniques, the ray-tracing.

After a few years of some attempts in research, it looks like finally a few companies have put its investments in the commercial development of very fast ray-tracing in the GPU. But their strategies follow slightly different paths, as we will see. The next lines come from the the thoughts and ideas I was writing down in my notebook on the Hot3D session in High Performance Graphics conference, where Caustic presented CausticRT, Intel presented Larrabee as a possible platform for really fast ray-tracing, and nVidia presented OptiX. After the three talks, there was a bit of a panel and discussion.

CausticRT by Caustic

CausticRT consists basically in two somewhat interdependent subproducts. On the one hand a new API based on OpenGL, called CausticGL is presented. It is used to set up, configure and specify the code for programmable rays that are traced into the scene and can be used for any computation that involves loads of rays to be traced in parallel. This API is quite clever and has the aim of becoming a standard. On the other hand they present CausticOne, a graphics card that acts as an interface between the API CausticGL and the rest of the hardware (CPUs and GPUs) and allow for very efficient computation of the ray-tracing. Of course, their business model relies on selling this hardware.

One of their selling points is that the API provides an abtraction for the ray-tracing layer, and in this way the developer does not have to worry anymore for the actual code involving scene acceleration structures or the actual ray-polygon intersection algorithms. For me, as part of a development team that does a ray-tracer, this is annoying, because what we developers want is to have access to this code and algorithms, just in case something is wrong or simply can be improved. Otherwise we have to rely on the faith that everything will work fast and efficient for all cases.

They have an agreement with the Brazil renderer, which has been ported to this platform, and some of their amazing demos are made using Brazil.

Larrabee by Intel

Larrabee is, above all, a bunch of important promises from Intel. It will be, hopefully at some point next year (2010), a revolutionary multi parallel processing card that aims to beat ATI and nVidia graphic cards. It is designed, besides for working for videogames as any given GPU card, to overcome the problems that programs not especifically designed for GPU (commonly called GPGPU) have with GPU cards, and, most importantly for a programmer like me, to avoid having to learn a new complicated shading language, as the known C++ code will be able to be compiled with for card.

Ray-tracing is one of the problems that is not directly solvable by a GPU program and falls into the world of GPGPU (even though we are talking about graphics too, we have seen before that GPUs are for rasterizing, not for ray-tracing), and if Intel gives us, via Larrabee, an easy way to efficiently program it in a parallel way, for me this is a winner.

OptiX by nVidia

OptiX is, as CausticGL, an API for programming ray-tracing in the GPU. The actual code looks very similar in the way rays are set up and programmed, but this nVidia one looks at first sight a bit uglier and unclean. In any case, the fact that this API works directly with an nVidia card, instead of using a hardware interface as CausticOne to accelerate ray-tracing, makes a priory the need for the new hardware quite void. Also Mental Images has been presenting their own ray-traced renderer based on harware aceleration, called iray. As we know that Mental Images is owned by nVidia, it is quite clear that iray is built using the OptiX technology, and the demos are quite impressive.

In conclusion

As a developer, I have all my hopes set upon Larrabee. Developing a ray-tracer for Larrabee should be several times easier than doing it for CausticOne or for nVidia, although the existence of their respective APIs should make lives easier to those that do not need to know anything about acceleration structures for the scene or the way the rays are intersected.

The clear advantadge is at this moment for nVidia. Their cards dominate the market of GPU, and do not need to fight to get introduced, even with a ray-tracing acceleration specific product (as CausticOne, which, as we have said can be overriden by OptiX) or with a more or less uncertain future product as Larrabee.

One year from now, we may have a more clear idea on how things are going to be in the world of ray-tracing in hardware. Meanwhile we will still be using software to ray-trace our images.

The cost of developing parallel software

On Monday morning at High Performance Graphics, Tim Sweeney from Epic Games talked about the End of the GPU roadmap. I am not going to sumarize the talk, but just point out an idea that was in one of his slides.

Let’s say X is the cost to create a single-threaded efficient program. Then,

  • if you want it multi-threaded, the cost is 2X,
  • if you want it to program for a cell architecture in a sony playstation 3, the cost is 5X,
  • if you want it to port it to GPGPU with CUDA, the cost is 10X.

For the average company, more than 2X the cost is usually unaffordable.

So, please, hardware designers and developers, provide hardare easy to program, with good and efficient compilers.

And I point out: Is this a claim for Larrabee?

The changes in the cg production pipeline

Larry Gritz, who works at Sony Imageworks, presented at High Performance Graphics in New Orleans a talk entitled “Production Perspectives in High Performance Graphics”. The talk was basically a description of the procedures of making a cg production and the problems involved in it, to see how the production process can be improved and accelerated with the aid of the new generation of graphics hardware.

With amazing pieces of the las works in which he has been involved (Cloudy with a Chance of Meatballs, 2012) he explained how thinks have changed in the later years.

A typical movie can have between 100 and 2000 shots of cg, and every shot consists in about 20 to 200 frames, that have to be rendered at high quality, high resolution and without artifacts. A typical scene can use about 2 to 20 Gb of memory and deal with the loading of more than 100 Gb of textures. It is expected that one frame takes between 4 to 10 hours of rendering, but for complex scenes, 10 to 20 hours can be acceptable. To do this, loads of 8-core machines with 16 to 32 Gb of RAM each are working 24/7. The total number of cores for the renderfarm is estimated to be 5000.

Given all these huge numbers, unexpectedly, the real cost of making a movie does not come from the huge rendering time, from the office space needed for these machines or from the energy to run them, but from the human flesh and brains time; the artist is the bottleneck. Any tool that helps the artist be more efficient in his job is a great money saving. The frame time is less important than pipe time.

The rendering paradigm is also changing. Before, the reyes technique with RenderMan was used to compute several different passes for the frame (beauty, reflections, refractions, highlights, ambient occlusion, fog, etc) for a later 2D composition. Now the use of ray tracing and global illumination have changed that and Sony has moved its rendering engine from RenderMan to Arnold by Marcos Fajardo (who also was at the conference and I had the oportunity to chat with him). Now less passes are required for final compositing.

Among the tools for the artists to do more efficient job, there are those of relighting, that help the illumination artist to tweak and adjust the illumination of the scene with high quality at interactive rates. Before these tools, the relighting was brute force: if the final image was not satisfactory, the frame, or the whole sequence had to be recomputed. Also part of the relighting could be done in the composition, with the different lightings in different passes and adjusting intensities. The deep buffer technique, keeping scene data at pixel level and allowing a real time adjustment of some kinds of lights, also helped with that. Then sorbetto, based on CUDA from nVidia, appeared, but it was not interactive enough for complex scenes, as it had complexity limits. The next version of sorbetto, named mocha, was promising but was killed before shipping (and I ask myself why?). Arnold has a basic relighting tool in which the raytracing is restarted for every change in the light, but allows for interactive feedback.

In conclusion, it’s not computation power that is needed but tools that help the artist work more efficiently. These tools should be multiplatform and multi OS, and some effort from the industry is needed to define standards. Will Shaderlight do the job?

A week in New Orleans: day 1, from London to New Orleans via Atlanta

Written on Saturday 1st August 10h25 (local time)

First I would like to apologize to my usual readers here. For a few days this blog will be in english and strictly related to my days in New Orleans. I have come here to be for the first time in 10 years and for the second time in my life at SIGGRAPH, the most important conference in graphics in the world.

I wish I had had the opportunity to do a proper interactive update of my impressions these days at the SIGGRAPH and High Performance Graphics (HPG) at New Orleans but nor the hotel nor the HPG conference (in which I am now, just before it starts), have free wi-fi to conect to internet. I probably will pay the 5 dollars (actually it has been 14.95 USD, so now it is even les likely that I update more often) for 24 hours conection tonight at the hotel and publish all this, but meanwhile I write offline.

Yesterday it was just the day of travel. It was a bit of a nightmare, especially from the moment I arrived to Atlanta, where I had to take a second plane to New Orleans. The flight from London was one hour late and I had only 2 hours and 15 minutes now reduced to 1 hour and 15 minutes to get to pass immigration, get the luggage, check in again, pass through security and get in the plane. It was the time to fly and I still was queueing to check in the luggage to New Orleans. I was lucky, though. Somebody from Delta shouted then that all flights were delayed. Mine was expected for 23h30, two hours later than scheduled. Actually, finally we were flying at 00h30, after no less than 4 changes of gate number.

I arrived really late and tired to the hotel, I have slept 5 hours, and here I am, listening to some guy talking about BVH…