Compiling Deep Learning Models to WebGL with TVM
Now TVM comes with a brand-new OpenGL/WebGL backend! This blog post explains what it is, and what you can achieve with it.
The OpenGL/WebGL Backend
TVM already targets a lot of backends covering a variety of platforms: CPU, GPU, mobile devices, etc… This time we are adding another backend: OpenGL/WebGL.
OpenGL/WebGL enables us to leverage the GPU on an environment which does not have CUDA installed. It is, for the time being, the only way of using the GPU inside a browser.
This new backend allows us to use OpenGL/WebGL in 3 different ways:
- Local OpenGL: We can compile a deep learning model into OpenGL and directly run it on the local machine, entirely using Python.
See here for examples of all three of them.
How is this Different from X?
So what’s unique about TVM with WebGL? The big difference is that the op kernels in TVM are automatically compiled, not handwritten. As shown in Figure 2, TVM utilizes a unified AST to define kernels, and compiles it to code on different platforms.
This means that:
- To deploy your existing model to WebGL, you don’t need to write a lot of additional code. The NNVM/TVM model definition is the same for all targets, so you just need to compile it to a new target.
- To add a new op kernel, you only need to define it in TVM once, instead of implementing it once for every target. You don’t need to know how to write GLSL code to add a new op kernel to WebGL!
Here we perform a benchmark for a typical workload: image classification using resnet18.
I’m using my 5-year-old laptop which has an 8-core Intel® Core™ i7-3610QM, and a GTX650M.
In this benchmark, we download a resnet18 model from the Gluon model zoo, and perform end-to-end classification on a cat image. We only measure the model execution time (without model/input/parameter loading), and each model is run 100 times to get an average. The results are shown in figure 3.
The benchmark is run in 4 different settings:
- CPU (LLVM): The model is compiled into LLVM IR and JIT’ed. Therefore, it is run entirely on the CPU.
- OpenCL: The model is compiled into OpenCL. There is still some glue code compiled to LLVM, which is responsible for setting up and launching OpenCL kernels. Then we run it on the local machine.
- OpenGL: Same as OpenCL, but compiled to OpenGL.
This is a first step toward automatic compilation of deep learning models into web browser. We expect more performance improvements as we bring optimizations into the TVM stack.
Show me the Code
- Checkout this complete code example.
We thank the developers of Emscripten for providing the fastcomp toolchain and the helps during the development.