How Mixed Reality Headsets work (basic overview with 10 illustrations)

Image courtesy: Microsoft

(Inside the Hololens. Image courtesy: Microsoft)

Whether in the comfort of your home, in a restaurant, or on a bus or the subway, seeing people in what I call the praying position is a very familiar sight. Heads bowed in deep concentration, arms cupped on their lap and a soft glow on their faces. The smartphone is a readily available window to the online and virtual worlds and an easily accessible escape from the more mundane aspects of our daily lives.

Mixed reality headsets such as Magic Leap and Hololens are at the very cutting edge of consumer media technology. They promise an exciting future where our virtual and real worlds will really combine. A future where we need not suffer the tiresome physiological discomforts and damaging repetitive-use-injuries that have become so commonplace with the burgeoning use of the mobile phone.

These technological marvels really do seem like magic. How do they work? How do they create virtual reality right in front of our eyes?

As the featured image of this article shows (above), there is a lot of diverse technology that goes into making these headsets work. To understand it all will take a lifetime and several advanced degrees in a variety of disciplines.  In this article I cover some of the very basics concepts that are needed to understand mixed reality.


1) The Human eye is a very complex camera


Image courtesy:

To begin to understand how mixed reality works, we have to have a basic understanding of how human vision works. Pardon the blatantly obvious statement but the Human eye really is a very complex system – far more complex than any camera we are able to design.

To describe the various functions of the eye like a camera requires some very gross oversimplification. The cornea is like an outer refractive lens. The sclera is like the frame of the eye and is used for rotation. The retina and foveola are like highly sensitive photo receptors. The iris functions like an aperture stop and shutter and the pupil is like an aperture. The lens can be accommodated by the relaxing or contraction of the ciliary muscle to allow focusing on variable distances. All of these working together create our highly sensitive visual perception. Read more about ‘how we see’ here.

If you go to some of the premier Mixed Reality conferences or discussions, you are likely to find optometrists taking the centre stage. This is because some of the most breakthrough advances in Mixed Reality come from optics. In the most fascinating of ways, the magic of Mixed Reality literally happens before our eyes.

2. Binocular disparity allows us to perceive depth.


Another obvious statement is that human beings normally have have two functioning eyes. The human eyes are set in a horizontal line about 50–75 mm apart (distance between the two pupils). This means that each eye perceives the world from a slightly different angle. The brain then merges these discreet inputs into a cohesive understanding of the world. One of the very neat tricks that the brain performs is the estimating and calibrating of the distance to an object by inferring information from subtle visual cues such as geometry, shadows and shades.

Read more about binocular disparity here

3. Mixed Reality uses a specialised image for each eye.Mixed Reality

Mixed reality (and Virtual Reality) headsets are essentially like two TV screens (one for each eye). Each screen projects a carefully adjusted perspective of the virtual object on lenses in front of the users eyes to create the illusion of an object with depth. Consider the screenshot below. Its from a VR simulation of a F1 race (See more of this project here). If you look carefully you can see that the two images render the scene from slightly different angles. When experienced in a VR headset the simulation seems to have 3D depth.


4. Micro display can be achieved with several different technologies

Several different types of display technologies can be used to create what is essentially a micro ‘projector’. Current headsets use technologies such as HTPS LCD, LCOS, micro-OLED, DLP, laser MEMS, etc, that are also found in televisions, computer monitors, smartphones and other displays.

If you were to shrink your normal conference room projector down to a tiny enough size, you would still have a hazy grey box around the virtual object that’s trying blend into the real world. This happens because the image rendering  and back-light functions of your projector are likely handled by separate systems.

Due to their emissive nature, OLED microdisplays are better suited for see-through/AR smart glasses. Read more about it here.

5. The lenses of See-through Head Mounted Displays (HMD) use specialised optics


Image courtesy: Ubicomp

Also called holographic optics, the lens used for mixed-reality near-eye projection is especially designed to allow specific wavelengths of light to be reflected back to the viewer blended with external light. The system uses total internal reflection to allow the projected image to travel to the desired viewing angle from within the glass of the lens itself. This makes the holograms seem like they are in the real space and not pasted on. Read more about how these lenses work here.

6. MR Headsets estimate where the user is looking by tracking the head.

head tracking

Most current-generation HMDs (Mixed and Virtual Reality) estimate where the user is looking by tracking the user’s head rather than their eyes. Often referred to as the gaze vector, it is an imaginary line pointing straight ahead from between the user’s eyes. The head is tracked using gyroscopic sensors fitted inside the HMDs (similar to the sensors that let your mobile know if it is held portrait or landscape). They inform the virtual simulation about the user’s current pose with input such as position, orientation, velocity and acceleration. Read more about how it is done here.

7. Advanced MR HMDs will track the eyes to see where the user is looking

eye tracking

As cool as MR is today, it’s going to be even cooler. The acuity of the MR displays is soon going to get a massive boost with the introduction of eye-tracking.
Eye-tracking will allow for MR headsets to use foveated rendering which will in turn improve fidelity as well as performance.

Read more about how eye-tracking works with HMDs here and here.

8. MR uses markerless tracking to place virtual objects in the real world

object tether

Since the virtual object is rendered on a screen that is essentially mounted on the user’s head, the object will appear to float when the user’s head moves. This will instantly break any illusion that the virtual object is present in the real world. To prevent this from happening most MR HMDs will detect and track surfaces and planes in the real world and tether (or anchor) the virtual object to them. That way, when any movement of the user’s head is detected, the object can be rendered at a corresponding displacement in the simulated world.

The fastest way to track this is using a recognisable physical marker (such as a QR code or a known pattern) that is placed in the real world. More advanced displays can achieve a similar effect using markerless tracking that is faster, convenient and more believable. Read more about markerless tracking here.

9. MR uses spatial mapping to understand the world around the user


Image courtesy: Occipital Bridge

To be able to realistically blend the ‘virtual world’ with the ‘real world’, it is important first for the simulation to understand the ‘real world’. This presents a problem since the real world is infinitely varied. For Mixed Reality (or indeed even Augmented Reality) to work convincingly the system has to scan the real world in real-time. In advanced Mixed Reality HMDs this is usually done with multiple cameras (usually two and sometimes more) working in tandem to create a spatial map using depth perception (see point 2 on how binocular disparity can be used to perceive depth).

Once the system has a spatial map of the area immediately around the user, it can use this map for virtual object placement and for computing the physics that allows the virtual object to interact with the real world in a realistic manner. Read more about how spatial mapping works here.

10. Real time Occlusion is used to make the virtual objects fit into the real world


Even with all the cool tricks listed above, the virtual object in the MR display will seem to appear in front of all other objects in the real world. This happens because the virtual object is always first in the user’s line of sight (Since the screen its rendered on is literally mounted on the user’s head).

To make the virtual object appear to be in the correct order in the real world a process called occlusion is used. As illustrated above, the trick is to make the virtual bananas seem like they are placed in the real bowl. To make this happen, the lower portion of the banana bunch – which wouldn’t normally be visible because of the nearer (to the viewer) brim of the bowl – is not rendered.

Read more about how real time occlusion works here.


Are you excited about Mixed Reality? What Mixed reality applications or technologies interest you the most? Let us know in the comments below.


About the Author:

Sid JainSiddharth is the Creative Director for Playware Studios a Singapore Serious Games Developer. He develops games for Military, Healthcare, Airlines, Corporate and Government training and Mainstream education. He has taught game design in various college programs at NTU, SIM, NUS and IAL in Singapore and is the author and proponent of the Case Method 2.0 GBL pedagogy.





This entry has 0 replies