5 MIN READ, JAN 28, 2020
Who else can tell us more about AR and VR than someone with over 20 years practice in the field? Ville Miettinen has a solid experience in computer graphics and vision industries. For the last 25 years, he was founding, building and exiting graphics technology companies in the Mummy Valley a.k.a. Finland. Some of them, such as  Hybrid Graphics and drawElements, received global success and were acquired by NVIDIA and Google respectively. Currently, Ville works as a Principal Algorithm Developer and Co-Founder at Varjo. During this exciting speech at BUILD stage at EMERGE 2019, Ville talked about differences in AR and VR and supported them with demo examples from Varjo.
Ville Miettinen at Emerge 2019
Ville Miettinen at EMERGE 2019
Back in the days, Finland used to be famous for mobile phones, but unfortunately, the Silicon Valley killed the Mummy Valley. These days Finland is back to the games, and now it has a pretty competitive gaming industry. They have such famous companies as Supercell, Rovio, and hundreds of other studios. Computer games always go in hand with computer graphics.

For the last 3 years Ville has been building the AR/VR headset company called Varjo, he is one of the founders there. They established it in late 2016 with 5 people on board. Now they have been growing to 145 people from 17 different countries. The development squad is located in Finland, the bunch of sales people is in the Silicon Valley, and the manufacturing team is placed in China. The gang is mostly ex-Nokia, ex-Microsoft, and many other people are from the games industry. At Varjo they build Mixed Reality headsets.
Three main peculiarities in ad tech Emerge 2019
'Varjo Technologies' from Ville's presentation


During his fascinating presentation, Ville went through the confusing terminology of VR, AR and XR and covered their differences. VR stands for Virtual Reality, AR — Augmented Reality and XR — Mixed or Extended Reality.

With Virtual Reality you have a headset on, it’s fully immersive and you’re not really aware of your physical surroundings, because you’re fully transported into a virtual world. In order to prevent bumping into physical things, there are some sensors and visual aids, which are called 'chaperones', to help you navigate.
During his fascinating presentation, Ville went through the confusing terminology of VR, AR and XR and covered their differences. VR stands for Virtual Reality, AR — Augmented Reality and XR — Mixed or Extended Reality.

With Virtual Reality you have a headset on, it’s fully immersive and you’re not really aware of your physical surroundings, because you’re fully transported into a virtual world. In order to prevent bumping into physical things, there are some sensors and visual aids, which are called 'chaperones', to help you navigate.
Virtual Reality Emerge 2019
'Virtual Reality' from Ville's presentation
Augmented Reality can be divided into two main segments — mobile AR and wearable AR. Let's firstly talk about mobile AR. A good example of one of the first mobile AR applications is Word Lens, which later on was acquired by Google and incorporated into Google Translate. It is basically an app where you can view the world and whatever there is a text, it gets automatically translated into the target language.
Word Lens AR app Emerge 2019
'Word Lens AR app' from Ville's presentation
The other type is wearable AR, where you have heavy headsets which are typically untethered, as you don't have a cable going. In wearable AR there is a graphics process running somewhere on the headset and it's able to display graphics on top of the real world. However, in wearable AR there are some fundamental challenges mainly because displays are very limited to a narrow field-of-view. Whatever objects you're seeing, there're just in the middle of your vision. And typically the waveguide technology darkens the real world quite a bit.
Wearable AR Emerge 2019
'Wearable AR' from Ville's presentation
What's Mixed Reality? The XR term is not very well-defined, but the idea is that the next generation of Augmented Reality, where we pick the good parts of VR and the good parts of AR, will represent XR. The end goal of mixed reality is to create an experience where you cannot distinguish any difference between the real and virtual worlds. Computer vision developers are able to take virtual objects, virtual experiences, and combine them seamlessly with the real world. Of course, this is an old truck complex problem, because in order to do that properly, what you need to do is to 3D capture everything that surrounds you. Also, you have to figure out the entire light transportation problem, where light is coming, what's hitting and what's bouncing. All these can help you to reconstruct what's happening between virtual and real worlds. Eventually, what we want to have is a virtual object, which will be reflecting the real world and vice versa. Virtual objects are casting shadows on a real-world surface, and these virtual light sources will be lighting the real-world environments on so forth.

It's worth highlighting that the whole AR/VR/XR industry is completely driven by the consumer market, as the main applications are games.

Three problems with AR technology

There are three problems that computer graphics coders are facing while working with AR technology.

1. When working with AR, we need to deal with display resolutions. The existing headsets have fairly low display resolutions. If you are rendering something virtual there, it's not going to look very real.
2. The existing AR technologies have also such problem like the narrow field-of-view, because of that, all the objects are located only in the center of the view.
3. Objects displayed in AR are hazy. Due to the specificity of AR technology of adding and image, objects don't look realistic.
Three problems with AR Emerge 2019
'Three problems with AR' from Ville's presentation
However, there are solutions to tackle the above listed issues. Jason Paul, who runs VR strategy at NVIDIA, says that it would take us about 20 years to reach human-eye resolutions. Then the question is how to get there. When we are talking about reaching human-eye resolutions, the following features should be taken into consideration. The human eye doesn't have a fixed resolution, it varies. But if it were a single resolution it would be roughly a 100 megapixel. It seems that we can just build a 100-megapixel display and problem solved. But it's not.
First of all, the highest resolution displays in existence, in laboratory conditions, are about 16 megapixels. Also, building high-res displays is very expensive. A couple of big display manufacturers in the AR/VR land, such as Microsoft and Magic Leap, both spent more than a billion dollars on the display technology. All the other big boys and small players like Varjo, they just take displays created for mobile phones, which are big enough to be used in a creative way. However, in the mobile phone world, the maximum display resolution has already been reached. Now you can watch Full HD videos on mobile phones, thus, there are no real drivers to bump the resolution.

Also, virtual reality needs much higher frame rates, and, unfortunately, mobile phone displays are not very suitable for that. Let's imagine for a moment that someone proposed us to build a mega display with 100 megapixels which we can put in our headsets. But again, there is an issue. We still need to render 100 megapixels per frame, and that's not going to happen, even with the latest generation display. Most of those pixels, that would be rendered, they would go to a peripheral vision where you cannot really see anything. It means that the bigger the field-of-view is, the more pixels will be spread beyond your eyesight.

It's possible to do variable rate rendering, meaning that we are not rendering at the same rate in all directions, but we only focus on the areas where a human eye can actually see better. For example, if you put your two fingers up in front of you, that's your foveal vision and an area where you can actually see accurately. When you are moving your hand to different sides, it's your peripheral vision where you can still see, but things are not sharp anymore and you just can detect motions happening there, but cannot actually see any details. There are rendering algorithms that are called foveated rendering, which try to imitate this.
NVIDIA quote Emerge 2019
'A quote by Jason Paul, GM of VR Strategy at NVIDIA ' from Ville's presentation
Let's say someone builds us a mega display with nice foveated rendering technology, and, finally, we got our human-eye resolution. But it doesn't work properly that way. The problem is with pixels that consume power. The power consumption is a massive issue in anything that you're wearing. All the headsets are using several watts of power and these plates are just in front of your eyes. Even the current generation displays are unbearably hot.

Speaking about the VR industry, it's not working the classical way. Whenever new technology brought to the market, it first comes to the high-end pro user land where the initial device will be tested with prosumers, and, after that, the product gets simplified, became easier to use, much cheaper for ordinary consumers. In the VR market, it goes the other way round. Back in decades, they started with consumers and fairly crappy products, and now the VR industry is trying to quickly fix itself by pushing to consumers products that are not technologically up there. Ville claimed that at Varjo they started with in a traditional way with pro products for the ultra-high-end users and hopefully quite soon they will introduce headsets to the consumer market.

When we're talking about AR/VR/XR, there is a place for disruption. For Ville, disruption is about moving a decimal point. Any decimal point, if you move it to the left or to the right, for example, you cut the price by 10X, you make something 10X faster and 10X more beautiful, means that you have proper disruption. What computer vision developers and graphics designers should do is to disrupt resolution. At Varjo they built a foveal screen putting two displays per eye. One is a small micro-LED display and another one like background display. These two displays overlay on top of each other using an optical combiner. That sounds like a really simple idea, but the devil is in the details. When you have two different devices from different manufacturers running at different resolutions, combining those images seamlessly is a pretty nasty engineering challenge, as you need to blur them properly. But once you've managed to do that, the end result will show you a display that roughly matches the resolution of the human eye.
Varjo's foveal screen Emerge 2019
'Varjo foveal screen from Ville's presentation
In order to push forward their technology, they decided to build their own custom eye-tracking solution at Varjo. They placed high speed infrared cameras inside the headset to track eyes moving. Such eye-tracking technology gives an accuracy of 0.2−0.3 degrees.
Varjo eye-tracking Emerge 2019
'Eye-tracker in Varjo headset' from Ville's presentation
Varjo has been building demos and use cases with eye-tracking for implementing further features in AR/VR. It turned out that eye-tracking is not only useful in UI and UX, but it's very useful in lots of different rendering algorithms, for example, computing the levels of detail of textures or geometry, procedural rendering, and streaming of 3D content. There's a prediction that eye tracking will become a standard component in AR/VR technology.
Varjo Use Cases Emerge 2019
'Varjo use cases' from Ville's presentation

Varjo VR & XR headsets

Ville summarized his talk with the presentation of Varjo's products they have been working on over the last years. The first headset they produced was VR-2, which was launched in February 2019 and now it's available for online purchase.
Varjo VR-2 Emerge 2019
'Varjo VR-2' from Ville's presentation
Later on in May 2019 they launched their second product, which is called XR-1. It's the same headset that solves the resolution problem, but it also has added cameras in the front. In this headset, there are cameras that can take images, which can be manipulated at will. In order to have an idea how XR-1 works, we suggest to watch the video below.
Varjo XR-1 Emerge 2019
'Varjo XR-1' from Ville's presentation
'Varjo XR-1 test shoot' from Ville's presentation
EMERGE team hopes you found this article insightful.
Stay tuned with us for further updates!
Contributing Author, EMERGE
Researcher in travel tech and travel enthusiast. Ilona is an advocate of women in science and tech. Addicted to coffee.