Dr. Henney Oh, co-founder and CEO of spatial audio specialist G’Audio Lab talks us through the processes of capturing, mixing and rendering sound for virtual reality and 360-degree video applications.
The premise of VR and 360-degree video is to simulate an alternate reality. For this to be truly immersive, it needs cogent sound to match the visuals. Humans rely heavily on sound cues to inform us of our environment, which is why immersive graphics need equally immersive 3D audio that replicates the natural listening experience. The challenge becomes how to draw the viewer’s attention to a specific point when there is continuous imagery in every direction, and sound cues can help with that.
The key to creating realistic audio for this is to synchronise sounds according to the user’s head orientation and view in real time. This helps replicate an actual human hearing mechanism, which makes the listening experience more realistic. Producing truly immersive sound requires several steps. First, you must capture the audio signals, then mix the signals and finally render the sound for the listener.
To replicate the natural listening experience, the use of two audio signals – Ambisonics and object – is essential.
Ambisonics is a technique that employs a spherical microphone to capture a sound field in all directions, including above and below the listener. This requires placing a soundfield microphone (also known as an Ambisonics or 360 microphone) somewhere near the position where you intend to listen to. Keep in mind that these microphones will record a full sphere of sound at the position of the microphone, so be strategic with where you place them. It’s also important that your mic is not spotted in the scene, so we encourage placing the microphone directly below the 360 camera.
In addition to capturing audio from a soundfield microphone, content creators also need to acquire sounds from each individual object as a mono source. This enables you to attach higher fidelity sounds to objects as they move through the scene for added control and flexibility. With this object-based audio technique, you can control the sound attributed to each object in the scene and adjust those sounds depending on the user’s view.
Capturing mono sound can also be tricky because the traditional use of a boom microphone to capture mono does not work in VR. In synchronised 360 sound recording, there is no space to place the boom microphone, so it is helpful to place a lavalier microphone directly on the individual (hidden underneath apparel).
Previously, sound mixing was typically formed by its target loudspeaker layout, but today’s object-based audio techniques allow for individual objects on screen, like a dinosaur, to be free from the representation layout, user’s listening point and even the sonic space. It is possible because you can send all of the object tracks to the player side. As with traditional mixing, you might need to add extra Foley, ADR and background music tracks to complete the sonic scene.
Combining object, Ambisonics and channels (like traditional 2.0 if needed) and balancing them plays an important role in mixing and mastering 3D audio. If you captured the object and the Ambisonics together, be sure that the Ambisonics signal already contains the objects. You may need an additional process to remove or balance those object tracks to ensure they aren’t counted twice.
Traditionally, you only needed to work on synchronising your sound with your image in time domain, which is referred to as lip-synchronisation. But with cinematic VR and 360 video, you also need to work on spatial synchronisation between the sound and image. For example, when producing traditional cinematic audio, you only need to look at an actor’s mouth and play the sound according to the movements of the mouth.
With VR and 360 video content, you not only need to consider the actor’s mouth movements but also carefully place the sound according to the position of the actor on the 360 screen, which requires a new and more dedicated sound mastering tool. Specifically, it’s now important to use a tool that lets you edit as you watch, so that while watching the visuals, you can match the sounds accordingly in both space and time.
There are many special processes needed on top of the conventional mixing workflow, requiring a dedicated authoring tool to work properly and conveniently.
Click bellow to read more about this…