Why GlTF is JPEG for Metaverse and Digital Twins

We are excited to bring Transform 2022 back in person on July 19th and practically from the 20th-28th. July. Join leaders in AI and data for in-depth discussions and exciting networking opportunities. Sign up today!


The JPEG file format has played a crucial role in the transition to the Internet from a text-based world to a visual experience through an open and efficient image sharing container. Now the Graphics Language Transmission Format (glTF) promises to do the same for 3D objects in metavers and digital twins.

JPEG utilized various compression tricks to reduce images significantly compared to other formats like GIF. The latest version of glTF also utilizes compression techniques for both the geometry of 3D objects and their textures. glTF already plays a key role in e-commerce, as evidenced by Adobe’s push into the meta-verse.

VentureBeat spoke with Neil Trevett, President of the Khronos Foundation, which administers the glTF standard, to learn more about what glTF means to businesses. He’s also vice president of developer ecosystems at Nvidia, where his job is to make it easier for developers to use GPUs. It explains how glTF complements other digital twin and metaverse formats like USD, how to use it, and where it’s headed.

VentureBeat: What is glTF and how does it fit into the ecosystem of file format types related to metaverse and digital twins?

Neil Trevet: At Khronos, we put a lot of effort into 3D APIs like OpenGL, WebGL and Vulkan. We’ve found that every app that uses 3D has to import assets at some point. The glTF file format is widespread and largely complementary to USD, which is becoming the standard for authorship and editing on platforms like Omniverse. USD is the place to be if you want to gather more tools into sophisticated pipelines and create highly sophisticated content, including movies. That’s why Nvidia is investing heavily in the USD for the Omniverse ecosystem.

On the other hand, glTF focuses on efficiency and ease of use as a streaming format. It is a lightweight, streamlined and easy to handle format that any platform or device can use right down to web browsers on mobile phones. The slogan we use as an analogy is that “glTF is 3D’s JPEG”.

It also complements the file formats used in authoring tools. For example, Adobe Photoshop uses PSD files to edit images. No professional photographer would edit JPEG files because so much of the information has been lost. PSD files are more sophisticated than JPEGs and support multiple layers. However, you would not send a PSD file to my mother’s cell phone. You need JPEG to stream it to a billion devices as efficiently and quickly as possible. USD and glTF thus complement each other in the same way.

VentureBeat: How do you switch from one to the other?

Trevet: It is important to have a transparent distillation process from USD assets to glTF assets. Nvidia is investing in a glTF connector for Omniverse, so we can easily import and export glTF assets to and from Omniverse. At Khrono’s glTF working group, we are pleased that the USD meets the industry’s need for a creative format, as it is a huge amount of work. The goal is for glTF to be the ideal distillation target for the USD to support widespread use.

A creative format and a delivery format have quite different design requirements. The design of the USD is about flexibility. It helps to compose things to make movies or VR environment. To import another item and merge it with the existing scene, keep all design information. And you want everything to be in true resolution and quality levels.

The design of a transmission format is different. For example, with glTF, vertex information is not very flexible for rewriting. But it is transmitted in exactly the form the GPU needs to run that geometry as efficiently as possible through a 3D API like WebGL or Vulcan. So glTF puts a lot of design effort into compression to reduce download times. For example, Google contributed their Draco 3D Mesh compression technology, and Binomial contributed their Base Universal texture compression technology. We have also started to put a lot of effort into Level of Detail (LOD) control so that you can download models very efficiently.

Distillation allows you to switch from one file format to another. A big part of it is deleting design and construction information that you no longer need. But you do not want to reduce the visual quality unless you really have to. With glTF, you can maintain visual fidelity, but you also have the choice of compressing things when aiming for a low-bandwidth implementation.

VentureBeat: How much less can you shrink without losing too much loyalty?

Trevet: It’s like JPEG where you have a disk to increase the compression with acceptable loss of image quality, only glTF has the same for geometry and texture compression. If it is a geometry-intensive CAD model, the geometry will be the bulk of the data. But if it’s more of a consumer-oriented model, the texture data can be much larger than the geometry.

With Draco, it is reasonable to reduce data by 5-10 times without significant quality loss. There is also something similar for the texture.

Another factor is the amount of memory required, which is a valuable resource in mobile phones. Before they implemented binomial compression in glTF, people sent JPEGs, which is great because they are relatively small. But the process of unpacking into a full-size texture can take hundreds of megabytes even for a simple model, which can hurt a cell phone’s power and performance. glTF Textures allows you to take a super-compressed texture in JPEG format and instantly decompress it to a GPU native texture so that it never reaches its maximum size. As a result, you reduce both data transmission and the required memory by 5-10 times. This can be useful if you download assets in a browser on a mobile phone.

VentureBeat: How do people effectively represent the textures of 3D objects?

Trevet: Well, there are two basic texture classes. One of the most common is simply image-based textures, such as mapping a logo image on a t-shirt. The second is procedural texture, where you generate a pattern, such as marble, wood, or stone, simply by running an algorithm.

There are several algorithms you can use. For example, Allegorithmic, which Adobe recently acquired, was pioneering an interesting technique for generating textures now used in Adobe Substance Designer. You often turn this texture into an image because it’s easier to process on client devices.

Once you have a texture, you can do more than just stick it on the model like a piece of wrapping paper. You can use these texture images to create a more sophisticated material look. For example, Physical Rendering Materials (PBR) is the place where you try to go as far as you can mimic the properties of real-world materials. Is it metallic, giving it a shiny look? Is it translucent? Does it break light? Some of the most sophisticated PBR algorithms can use up to 5 or 6 different texture map feed parameters that characterize its degree of gloss or translucency.

VentureBeat: How has glTF advanced on the stage graph side to represent relationships in objects, such as how car wheels can turn or connect multiple things?

Trevet: This is an area where the USD has a leg up on glTF. To date, most glTF usage cases have been met by a single asset in a single asset file. 3D trading is a great use case where you want to mount a chair and drop it in your living room like Ikea. This is a unique glTF asset, and many use cases have been pleased with it. As we move into the meta-verse and VR and AR, people want to create scenes with multiple assets to implement. An active area discussed in the working group is how best to implement multi glTF scenes and assets and how we connect them. It will not be as sophisticated as the USD as the emphasis is on transmission and delivery rather than creation. But glTF wants something to enable multi-asset compounding and linking in the next 12-18 months.

VentureBeat: How will glTF evolve to support more metaverse and digital twins?

Trevet: We need to start bringing things beyond the physical appearance. Today we have geometry, textures and animations in glTF 2.0. The current glTF says nothing about physical properties, sounds, or interactions. I think many of the next generation of extensions to glTF will incorporate this kind of behavior and traits.

The industry is deciding right now that it should be USD and glTF in the future. Although there are older formats like OBJ, they are starting to show their age. There are popular formats like FBX which are proprietary. USD is an open source project and glTF is an open standard. People can participate in both ecosystems and help develop them to meet the needs of their customers and the market. I think the two formats will evolve side by side. Now the goal is to keep them in line and keep the efficient distillation process in between.

Leave a Comment