Crunching megapixels in WebGL

I recently finished Radi 0.9. It has a major new feature (well, from my point of view at least!) which allows the user to design realtime image filters and publish them to the web. The big idea is that nothing gets pre-rendered: instead, the filter is always rendered in the browser, which allows the web page to modify and change the filters on the fly. (I’ve made some elementary demos, you can see them here.) This feature is implemented using WebGL, a 3D extension of the HTML5 Canvas interface.

This is the first time I’ve used this fairly new browser API, which could be reasonably characterized as “bleeding-edge”… Although WebGL is supported in Chrome, Firefox and Opera, even in these browsers it may be disabled on systems whose graphics hardware has not been tested and qualified. WebGL is also available in Safari, but it’s off by default and needs to be manually enabled through the special ‘Develop’ menu. To further complicate things, WebGL is a very large API that needs to interact directly with both browser internals as well as graphics drivers, which makes it even more of a moving target for development. Given all this, I’m somewhat amazed that WebGL works as well as it does already!

In this post, I’d like to share a few things I learned about WebGL. If you’re not a JavaScript developer, you won’t find most of this post very interesting. However the first sections may be worth your time even if you’re not a developer but just interested in using Radi’s WebGL filters, because they explain how Radi’s WebGL filters work and also some of their limitations.

My experience with WebGL is slightly different from the typical sample code and demos you’d find on the web. Most of those are built like traditional single-view OpenGL applications, in the sense that the web page contains a single large WebGL element that renders a beautiful 3D scene autonomously (without regard to the other stuff happening on the same page). In contrast, the WebGL code I wrote for Radi doesn’t really do any interesting 3D stuff at all, but on the other hand it needs to interact closely with other elements – not just to the extent of receiving a few events, but loading large amounts of image data with precise timing.

Image filters are familiar to pretty much everyone thanks to apps like Photoshop or Instagram. It’s important to note that the WebGL filters I’m talking about are realtime video filters, not just “one-shot” filters applied to a single photo. This kind of streaming image processing is a perfect candidate for hardware acceleration, and WebGL allows us to accomplish this.

The unique feature of creating filters in Radi is that it all happens visually, simply by dragging and connecting blocks. This is thanks to Radi’s built-in Conduit effect design environment. When the filters are published as part of your HTML5 web page, Radi translates the Conduit effects into a WebGL-compatible format, and also embeds a small WebGL rendering library that takes care of actually applying the filter. This library is also the interface with which other scripts on the web page can access to interact with the filter.

In WebGL, the filters are implemented using shaders – small programs that execute on the GPU. This ensures that the filters are rendered on the graphics hardware and can take advantage of the tremendous pixel-pushing power in modern NVIDIA and AMD GPUs. Radi generates all the JavaScript and shader code, so you don’t need to know about the details, but it’s good to understand that each filter becomes a shader program. (By the way, if you’re familiar with Adobe’s Pixel Bender that was introduced in Flash Player 10, you can think of Radi’s WebGL filters as being essentially the same thing.)

There is actually another possible way that this kind of programmable image filtering could be introduced to the web. Adobe has proposed CSS shaders which could be applied to any element in an HTML document. Right now these are not implemented anywhere, so using WebGL is the only practical solution… But if CSS shaders were to be widely adopted, it would be trivially easy to make Radi output CSS shader code as well, because the underlying shader language is the same in WebGL and the Adobe CSS proposal.

Filter limitations

The extent of the current implementation of Radi’s WebGL filter publishing is basically “the first thing that works”. It doesn’t support everything that the native implementation of Conduit does; some features missing due to technical complications, some simply due to lack of time.

The major limitation is that filters are currently limited to a single shader. Anything that requires multiple shader passes doesn’t get exported in its complete form. You can design that kind of filter in the Conduit Editor within Radi and it will of course render correctly within the app, but Radi will warn you when publishing.

In practice multiple shader passes are generated when the effect contains nodes that must do complex sampling within their source image(s). Conduit contains an efficient shader compiler which is able to combine most nodes into a single shader, but some like Gaussian Blur and 3D Transform simply require things like intermediate rendering targets or vertex processing that can’t be expressed within a shader. So, these nodes don’t get exported into WebGL. This is not impossible to fix, but it requires a WebGL-specific implementation of those nodes – I’ll need to go look at the native source code for each of those nodes and translate the code into JavaScript. That’s not insurmountable, but more than an afternoon’s work.

The other limitations are more due to missing user interface features rather than any technical limitation. WebGL filters don’t currently support more than one input image, even though Conduit supports up to seven simultaneous video streams as the inputs of one effect. One way to fix this would be to introduce “image wells” in Radi, so that you could for example select another Canvas element as a secondary image input to a filter. This would be useful for blending and keying effects, or anything that requires multiple inputs.

However this limitation is not quite as dire as it seems, because WebGL filters support alpha transparency. This is primarily useful for keying. If you want to build a blue or green screen keyer in Radi, you can simply let the filter render transparency in its output, and then place another element behind the filtered element.

One more limitation is that you can’t create keyframes in Radi to modify a filter’s parameters in time. For now, changing the parameters needs to happen using JavaScript. This is another case of missing UI – until now only content layers have been keyframeable in Radi, not element parameters, so there was some code missing to make this work.

So much for Radi’s limitations – let’s go under the hood to see how things actually work in the browser.

Texture sources in WebGL

To render a filter, we need a source image. In WebGL images are represented as textures, so any image that we want to manipulate needs to go into a texture.

Radi’s filters can be applied to the HTML5 Canvas and Video elements, as these are the most likely types of content that you might want to modify using a realtime effect. Luckily WebGL is designed to interact well with these elements. The method used to upload textures to WebGL is texImage2D() on a WebGLTexture object. This method accepts Canvas and Video elements directly as parameters, so we don’t need to convert these elements into other types of image data before using them as textures, which would most likely have a significant performance impact.

(If you want to follow along with specific code to see how Radi does this, you can view the source for one of the Radi+WebGL demos and search for a comment containing the text ‘Radi WebGL filter module’. The object containing this comment is the filter published by Radi. It’s a self-contained WebGL engine with various internal functions and a return statement at the end which exposes a public API that the rest of the Radi library uses to access the filter. For example, the texture upload part is located in a function named uploadTextureImageToGPU, which is called from the public function renderWithSourceElement.)

So far, so good. In principle the task of taking a Canvas or Video element and rendering its contents in a WebGL element is simple: create a texture, upload the source image, render a textured quad. In practice there are many details that can prevent good performance or correct rendering.

I’m not going to explain all of the texture API, as you can find that in other WebGL tutorials. Rather I’ll mention some specific details that affect this use case:

  • There is a WebGL-specific pixel storage instruction that you can use to tell WebGL that your image needs to be flipped vertically. You’ll see this commonly used in WebGL code examples:
    gl.pixelStorei(gl.UNPACK_FLIP_Y_WEBGL, true);

    This is a consequence of OpenGL’s default coordinate space having a different orientation than the traditional default in computer graphics, “origin at top left”. In a typical situation where textures are uploaded rarely – perhaps only once when the 3D scene is set up – it’s no problem to ask the API to do a flip for each upload. But for video filters, it’s better to modify your view or texture coordinate space rather than having the WebGL engine flip every incoming image. (I don’t know how the various browsers actually implement this; perhaps it can be optimized to be a very lightweight operation that is done directly by the GPU. But if the browser needs to actually hit those pixels in memory, it’s going to have an impact, and it’s reasonable to assume that some browsers will implement flipping this way.)
  • OpenGL traditionally required a texture’s dimensions to be an even power of two. That is, 512*256 is an acceptable texture size, whereas 640*480 is not. Since these “non-power-of-two” (NPOT) texture sizes are so common when dealing with real-world image data like video streams, a solution is needed.
    One possibility would be to resize the image in code before uploading. In WebGL this can be accomplished by rendering into a temporary Canvas element of the desired power-of-two size, and using that as the texture source. But obviously that’s both a heavy performance hit and a loss of image fidelity due to sampling, so we need something better.
    OpenGL was extended with various vendor-specific solutions to this problem. Fortunately those outdated methods have not spilled over to WebGL. There is a standard way to use NPOT textures with the 2D texture target, with the limitation that it’s not possible to use texture wrapping modes other than “clamp to edge”. So, to trigger the NPOT-compatible code path in WebGL, don’t forget to specify these texture parameters:
    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);

The predictable unpredictables

Now we come to the hairy underbelly of how WebGL’s implementation interacts with the browser’s larger rendering and execution framework. In my humble opinion, herein lies something of an Achilles’ heel for WebGL: its resource model is a direct mapping of OpenGL, but it has an “impedance mismatch” with the surrounding browser environment which opens the door for unexpected behavior.

OpenGL was designed as a C API in a world where applications are expected to do careful and constant manual memory management, whereas the browser is a request-oriented garbage collected environment. Especially before AJAX took off, a web browser could reasonably assume that the bulk of work happens on the initial page load and any JavaScript executed within the page will not be doing much memory-intensive stuff. (Remember all the issues in Internet Explorer 6 regarding JavaScript memory leaks with circular references and other tricky special cases? That’s what happens when a requested-oriented design starts being stretched by users to accommodate long-running applications.)

WebGL looks superficially just like OpenGL, but the actual memory behavior can be very different. Even when you’re doing everything “by the book”, it’s easy to balloon your browser’s memory usage by 1-2 GB just by doing texture uploads in WebGL. The garbage collector will come by eventually and clean up the mess, but this is clearly not good for realtime performance.

Another bit of  browser-specific voodoo relates to refreshing the element that is used as the source for the filter. If you take a canvas or video element and hand it over to WebGL, there’s no guarantee that it will actually contain the data that you want: the rendering may be in an incomplete state (particularly for Canvas), or it may be that the image is not being refreshed at all (particularly for Video).

To be able to reliably deliver data to WebGL, we would need two things in the canvas/video APIs: one call that requests a render update, and another call that flushes the update (or alternatively a callback that lets us know when the update is complete). Canvas has the former, but doesn’t have the latter. HTML5 video doesn’t have either – it’s basically “play and pray” for a developer that wants to get images from the video stream.

The latest release of Chrome at the time of my testing (version 19) was particularly unreliably in regard to Canvas refreshes. If you render into a canvas element and then immediately try to use the canvas as a WebGL texture source, the rendering will most likely be in an unfinished state, with some graphics objects missing. Because there’s no way to find out when Chrome has actually finished rendering in the Canvas, I was forced to special-case my rendering code so that on Chrome only, it uses a setTimeout() delay mechanism to do the texture update. As long as the delay is less than the update interval of the Canvas element, this is not a problem, but it’s a very unsatisfactory solution. With the rapid rate of Chrome’s improvement, hopefully this will remain a temporary workaround.

Safari appears to have lots of trouble in updating a video element reliably. There may be an over-zealous optimization at play here: for example, when the video element is hidden or just obscured by the WebGL element, it doesn’t get updated at the full frame rate anymore. Safari seems to assume that video elements are only used as user-visible players, and so they don’t need to be updated if they’re not directly visible. (Perhaps this could be worked around by having the video element displayed below the WebGL element, but with the video’s opacity dialed down to almost zero to make it practically invisible. That would leave an annoying apparently-empty area if the element is at the end of the page, though.)

In general, the optimizations that browsers perform based on element visibility are a mysterious can of worms. I tried a few different variations of where to place the source element: could it be out of the DOM entirely, or hidden or set to ‘display:none’? All of these resulted in missed updates, so in the end I decided to place the source element in the same location as the WebGL element but with a lower z-index and very-close-to-zero opacity.

Weird transparency

The issue of premultiplied vs. straight alpha is an ancient source of frustration in computer graphics. Straight alpha feels more natural to people (because the transparency information is truly separate from the color information), but premultiplied is more effective to process and is more amenable to hardware filtering. Hence premultiplied has pretty much won the “alpha war” in the past decade. It’s the default used by such widespread APIs as the Mac/iOS, Windows and Android 2D graphics rendering systems, and by consequence it’s often encountered in applications as well, especially those that use the GPU.

The browser graphics APIs don’t follow this “industry standard”, but instead have inherited an unfortunate mélange of opinions on this topic. The Canvas API specifies that calls such as getImageData() which give access to pixel data must return non-premultiplied alpha values. This is presumably intended to make the API easier to use by programmers who are not familiar with premultiplied alpha, but because the underlying pixel data is most likely premultiplied (since that’s what the platform APIs use), this good intention results in both a performance hit and programmer confusion: you write some pixels with putImageData(), read them back with getImageData(), but they have mysteriously changed along the way due to the premultiplying and unpremultiplying.

WebGL comes into this mess with an “alpha-agnostic” perspective that it inherits from OpenGL. In addition to the existing jungle of OpenGL calls, WebGL adds new context creation parameters and pixel store parameters that determine how a WebGL element’s alpha will be handled in relation to other APIs and WebGL’s own texturing calls. With all this, it’s become really easy to get the alpha wrong at some point. The situation would be somewhat easier if the Canvas API designers had set an example by standardizing on premultiplied alpha, but it’s too late for that now.

This is the end of my notes on image processing in WebGL. If you have any questions, please write a comment or get in touch via other means (my email and Twitter can be found on the About page).

Of course, if this got you interested in playing with WebGL filter design in Radi, don’t forget that it’s a free download – get Radi 0.9 here.

This entry was posted in Programming, Web. Bookmark the permalink.

One Response to Crunching megapixels in WebGL

  1. Pingback: WebGL around the net, 5 July 2012 | Learning WebGL

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>