Below is a list of research projects currently ongoing or completed in the CAPE. Click on each project for more details.
Current Projects
360° virtual reality audio
Capturing and rendering audio for 360° virtual reality
Audio for 3D immersive reproduction
Capturing and rendering audio for 3D immersive reproduction
CityTones
CityTones: a Repository of Crowdsourced Annotated Soundfield Soundscapes
Phantom image elevation effect and VHAP
Phantom image elevation effect & Virtual Hemispherical Amplitude Panning (VHAP)
Objective measurement of perceptual audio
Towards a framework for the objective measurement of perceptual audio attributes
Prediction of auditory image position model
Development of a perceptual model for the trade-off between interaural time and level differences for the prediction of auditory image position
Quantifying Factors of Immersion in VR
Quantifying Factors of Immersion in Virtual Reality
Room acoustics in VR/Augmented reality
Perception of room acoustics in virtual/augmented reality in the context of 6 degrees of freedom
Sound source dependency of elevation localisation
Investigation into the sound source dependency of elevation localisation in multichannel audio systems
Soundscape evaluation in virtual reality
Investigations into the recording and reproduction methods for soundscape evaluation in virtual reality
New user interface design for music production
New user interface design for music production
3D Audio Toolbox (3DAT)
3D Audio Toolbox (3DAT)
HULTI-GEN
Huddersfield Universal Listening Test Interface Generator (HULTI-GEN)
HAART
Huddersfield Acoustical Analysis Research Toolbox (HAART)
Completed projects
Perceptual Optimisation of Virtual Acoustics
Perceptual Optimisation of Virtual Acoustics (POVA)
Perceptual Band Allocation (PBA)
Perceptual Band Allocation (PBA) for rendering vertical image spread
Pinna related transfer function attributes
The perceptual contribution of pinna related transfer function attributes in the median plane
Audio Dynamics
Audio Dynamics - Towards a Perceptual Model of Punch
Vertical interchannel decorrelation in 3D sound
Investigations into the perception of vertical interchannel decorrelation in 3D surround sound reproduction
Frequency thresholds and interchannel crosstalk
The analysis of frequency dependent localisation thresholds and the perceptual effects of vertical interchannel crosstalk
Non-Linear Sonic Signatures
An Investigation into Non-Linear Sonic Signatures with a Focus on Dynamic Range Compression and the 1176 Fet Compressor
Effects of a vertical reflection
The effects of a vertical reflection on the relationship between listener preference and timbral and spatial attributes
Contemporary Metal Music Production
Contemporary Metal Music Production
Perception of echo thresholds
The effect of sound source and reflection angle on the perception of echo thresholds
Loudness perception
An investigation into the changes of loudness perception in relation to changes in crest factor for octave bands
Capturing and rendering of audio for 360° virtual reality
Researchers: Dr Hyunkook Lee, Connor Millns
Supervisor: Dr Hyunkook Lee
Project summary: This project investigates into the recording and reproduction methods for 360° audio for virtual reality applications. Currently, the most popular method for capturing 360° audio for VR is arguably the first order Ambisonics (FOA). FOA microphone systems are typically compact in size, thus convenient for location recording, and offers a stable localization characteristic and a flexible sound field rotation functionality. However, FOA has limitations in terms of perceived spaciousness and the size of sweet spot in loudspeaker reproduction due to the high level of interchannel correlation. On the other hand, a near-coincident microphone array, which incorporates directional microphones that are spaced and angled outwards, can provide a greater balance between spaciousness and localizability than a pure coincident array. The current project investigates into the localisation accuracy and spatial attributes of Equal Segment Microphone Array (ESMA) for music and urban soundscape VR applications in both visual and non-visual conditions. Also, different recording and reproduction techniques are perceptually evaluated in terms of their low-level spatial attributes. The optimal use scenarios for different techniques are determined depending on sound source, acoustic conditions and environmental context. Below is the summary of some of the key findings so far.
- The correct spacing between microphones for a quadraphonic ESMA using cardioid microphones is 50cm. This has been proposed based on a novel interchannel level and time trade-off model called MARRS (Lee et al. 2017), and verified through subjective listening tests (Lee 2019). According to the model, the spacing is calculated to be smaller with microphones with higher directionality (e.g. 37cm for supercardioid and 25cm for hypercardioid).
- ESMA produces better results than FOA in terms of environmental width, listener envelopment and overall spatial quality, whereas FOA tends to provides slightly greater environmental depth and source distance (Millns and Lee 2018). However, this difference depends on the positional arrangement of sound sources rather than the type of sound source.
- ESMA-3D with a vertical extension of ESMA outperforms FOA in sport event recording and reproduction in both 9.1 and 5.1 formats in terms of presence, robustness, envelopment and overall quality of experience (Moulson and Lee 2019). This is regardless of the presence of visual scene.
- Timbral and spatial degradation of ESMA-3D recordings in Ambisonic binaural rendering is in “Excellent” to “Good” categories for complex musical ensemble recordings when the “magnitude least square” decoding method (IEM binaural renderer) is used (Lee et al. 2019). The conventional cube virtual loudspeaker decoding produces quality in “Good” to “Fair” range.
Publications:
- Lee, H., Matthias, F., and Zotter, F. (2019). Spatial and Timbral Fidelities of Binaural Ambisonics Decoders for Main Microphone Array Recordings, AES International Conference on Immersive and Interactive Audio, York.
- Lee, H. (2019). Capturing 360° Audio using an Equal Segment Microphone Array (ESMA). AES: Journal of the Audio Engineering Society, 67(1/2), 13-26. https://doi.org/10.17743/jaes.2018.0068
- Millns, C., & Lee, H. (2018). An Investigation into Spatial Attributes of 360° Microphone Techniques for Virtual Reality. In Proceedings of 144th AES International Convention [Convention Paper 10005] Audio Engineering Society.
- Lee, H., Johnson, D., & Mironovs, M. (2017). An Interactive and Intelligent Tool for Microphone Array Design. In Audio Engineering Society Convention 143 Audio Engineering Society.
Capturing and rendering audio for 3D immersive reproduction
Researchers: Dr Hyunkook Lee, Dr Christopher Gribben, Dr Rory Wallis, Connor Millns
Supervisor: Dr Hyunkook Lee
Project summary: The recently proposed multichannel audio formats such as Dolby Atmos, Auro-3D and NHK 22.2 employ height channels to provide the auditory sensation of a “three-dimensional (3D)” space. This project, funded by EPSRC (EP/L019906/1), aims to provide fundamental psychoacoustic principles for the perception, recording and reproduction of height dimension in 3D reproduction. Below is the summary of some of the main findings and outcomes from this project so far.
- The effect of vertical microphone spacing in a main microphone array on perceived spatial impression in 3D reproduction is not significant (Lee and Gribben 2014). This led to the design of a 3D microphone array called PCMA-3D, which is horizontally spaced but vertically coincident. This finding has also been adopted in the new design of Schoeps’s ORTF-3D microphone array, which is a great example of innovation through academic research.
- In order to avoid unwanted upwards shifting of source image in 3D reproduction, a direct sound captured or reproduced from a height channel (i.e. vertical interchannel crosstalk) should be at least 7dB attenuated compared to the same sound captured or reproduced from the main channel (Lee 2012; Wallis and Lee 2016; Wallis and Lee 2017). This implies that a directional microphone serving a height channel should be sufficiently angled upwards to reduce the amount of direct sound.
- Interchannel time difference is not a reliable cue for vertical phantom imaging (Wallis and Lee 2015). In other words, vertical summing localisation using the time delay cue does not work. Furthermore, the precedence effect does not operate in the vertical plane in the strict sense. Some localisation dominance towards the earlier source can be observed depending on the frequency band, but the perceived image position is never shifted fully to the earlier source.
- It has been found that the effect of vertical interchannel decorrelation is minimal compared to that of horizontal decorrelation (Gribben and Lee 2017). The effect of vertical decorrelation is significant, albeit small effect size, only above around 500Hz (Gribben and Lee 2018).
- A large scale library of impulse responses captured for 13 source positions and 40 different microphone array configurations from stereo to 3D has been established (Lee and Millns 2017). This is available for free download at www.hud.ac.uk/apl/resources.
- A Pure Audio Bluray album has been produced and released in Dolby Atmos, Auro-3D 9.1 and DTS 5.1 formats for Siglo de Oro choir. http://delphianrecords.co.uk/product-group/hieronymus-praetorius-missa-tulerunt-dominum-meum-blu-ray/
Publications:
- Gribben, C., & Lee, H. (2018). The Frequency and Loudspeaker-Azimuth Dependencies of Vertical Interchannel Decorrelation on the Vertical Spread of an Auditory Image. AES: Journal of the Audio Engineering Society, 66(7-8), 537-555. https://doi.org/10.17743/jaes.2018.0040
- Gribben, C., & Lee, H. (2017). A Comparison between Horizontal and Vertical Interchannel Decorrelation. Applied Sciences, 7(11), [1202]. https://doi.org/10.3390/app7111202
- Lee, H., & Millns, C. (2017). Microphone Array Impulse Response (MAIR) Library for Spatial Audio Research. In Audio Engineering Society Convention 143 Audio Engineering Society.
- Wallis, R., & Lee, H. (2017). The Reduction of Vertical Interchannel Crosstalk: The Analysis of Localisation Thresholds for Natural Sound Sources. Applied Sciences, 7(3), [278]. https://doi.org/10.3390/app7030278
- Wallis, R., & Lee, H. (2016). Vertical stereophonic localization in the presence of interchannel crosstalk: The analysis of frequency-dependent localization thresholds. AES: Journal of the Audio Engineering Society, 64(10), 762-770. https://doi.org/10.17743/jaes.2016.0039
- Wallis, R., & Lee, H. (2015). The effect of interchannel time difference on localization in vertical stereophony. AES: Journal of the Audio Engineering Society, 63(10), 767-776. https://doi.org/10.17743/jaes.2015.0069
- Lee, H., & Gribben, C. (2014). Effect of vertical microphone layer spacing for a 3D microphone array. AES: Journal of the Audio Engineering Society, 62(12), 870-884. https://doi.org/10.17743/jaes.2014.0045
- Lee, H. (2012). The relationship between interchannel time and level differences in vertical sound localisation and masking. In 131st Audio Engineering Society Convention 2011 (Vol. 1, pp. 592-604)
CityTones: a Repository of Crowdsourced Annotated Soundfield Soundscapes
Researchers: Prof Agnieszka Roginska (NYU), Dr Hyunkook Lee, Ana Elisa Mendez Mendez (NYU), Scott Murakami (NYU), Andrea Genovese (NYU)
Supervisors: Prof Agnieszka Roginska, Dr Hyunkook Lee
Project summary: The CityTones project is a collaborative open-source repository initiated and administered by New York University (NYU) Steinhardt and the University of Huddersfield. CityTones invites sound recordists around the world to contribute to the repository using 360-degree audio and visual capture methods. The database includes descriptors containing information about the technical details of the recording, physical information, subjective quality attributes, and sound content information. The recordings are verified after submission and made available in the public database. The database will be publicly available for users to download. Applications include the simulation of environments, sound design, research areas such as audio engineering, human computer interaction and machine listening. The data and recordings can be used to study immersive recording techniques. The data with crowdsourced annotations can be used in machine listening research to train models for sound source identification.
The microphone system to be used for audio recording must be compatible for 360° audio rendering in the 1st order B-format. If a Higher Order Ambisonics (HOA) system is used, the recording must be converted into the 1st order B-format for submission. A multichannel spaced microphone array designed for 360° audio capture can also be used (e.g., ESMA-3D, Schoeps ORTF-3D, etc.). Signals captured by such an array must be encoded in the 1st order B-format. The audio recordings should recorded digitally in the PCM wave format at a sampling rate of 48 kHz, with a bit depth of 24 bits. A spherical visual recording or at least a panoramic picture must accompany the audio recording in order to provide all-around visual information about the recording location. The duration of the recordings must be a minimum of 3 minutes in length, with no specified maximum time limit. A length of about 5 minutes for each recording is recommended.
The audio/video recordings for CityTones will be submitted through a portal on the NYU Immersive Audio Group website https://wp.nyu.edu/immersiveaudiogroup/citytones/. The submission process involves a Google survey and submission of the recording. Through the survey, submitters provide descriptive information including physical and technical details, and subjective quality attributes.
Publications:
- Roginska, A., Lee, H., Mendez, A. E., Murakami, S. (2019). CityTones: a Repository of Crowdsourced Annotated Soundfield Soundscapes, Audio Engineering Society 146th International Convention, Dublin.
Phantom image elevation effect & Virtual Hemispherical Amplitude Panning (VHAP)
Researcher: Dr Hyunkook Lee, Dr Dale Johnson and Maksims Mironovs
Supervisor: Dr Hyunkook Lee
Project summary: Early studies reported that, when two identical signals are simultaneously reproduced from a pair of loudspeakers that are placed at ear level and arranged symmetrically from the listener position, the resulting phantom centre image would be perceived to be elevated in the median plane. It was also confirmed in the studies that the degree of perceived elevation would increase as the loudspeaker base angle increased from 0° to 180°; the image would be perceived almost right above the listener’s head when the base angle is 180°.
This project investigates into this psychoacoustic effect further, providing more systematic subjective data and theoretical explanations, and also develops a new virtual 3D panning method called VHAP (virtual hemispherical amplitude panning) based on the effect. The main findings and outcomes so far as follows:
- The strength of this effect significantly depends on the type of sound source; sound sources with a flatter frequency spectrum and more transient nature would be perceived to be more elevated (Lee 2018).
- A new theory has been proposed: in addition to conventional explanation regarding spectral energy balance at high frequencies, it has been proposed and verified that the phantom image elevation effect is associated with spectral notch below 1 kHz that is caused due to the combination of ipsilateral (direct) and contralateral (interaural crosstalk) signals from the loudspeakers; e.g. for the 180° base angle the spectral notch for a phantom centre image is around 640Hz, which matches the spectral notch frequency for a real source elevated at 90° in the median plane (Lee 2017).
- VHAP (virtual hemispherical amplitude panning) creates an elevated phantom source on a virtual upper-hemisphere with only four ear-height loudspeakers, based on the phantom image elevation effect. A set of constant power gain coefficients are applied to loudspeakers at ±90° and 0° for panning to a target azimuth and elevation in the front region, and to those at ±90° and 180° for panning in the back region (Lee et al 2018).
- Listening tests in loudspeaker reproduction show that VHAP can locate a phantom image at various spherical coordinates in the upper hemisphere with some limitations in accuracy and resolution (Lee et al 2018).
- Listening tests in binaural headphone reproduction indicate that the binaural rendering of VHAP is able to externalise elevated phantom images in various degrees of perceived distance (Lee et al 2019).
Publications:
- Lee, H, Mironovs, M, & Johnson. D. (2019) Binaural Rendering of Virtual Elevation using the VHAP Plugin. In Proceedings of 146th AES International Convention
- Lee, H., Johnson, D., & Mironovs, M. (2018). Virtual Hemispherical Amplitude Panning (VHAP): A Method for 3D Panning without Elevated Loudspeakers. In Proceedings of 144th AES International Convention [Convention Paper 9965]
- Lee, H. (2017). Sound Source and Loudspeaker Base Angle Dependency of Phantom Image Elevation Effect. AES: Journal of the Audio Engineering Society, 65(9), 733-748. https://doi.org/10.17743/jaes.2017.0028
- Lee, H. (2016). Phantom image elevation explained. In 141st Audio Engineering Society International Convention 2016, AES 2016 [9664] Audio Engineering Society.
Towards a framework for the objective measurement of perceptual audio attributes
Researcher: Andrew Parker
Supervisors: Dr Steve Fenton, Dr Hyunkook Lee
Project summary: A real-time system for the objective measurement of perceived ‘punch’ in a music signal has been developed based on previous work. The system’s output shows ‘strong’ correlation with perceptual scores obtained through subjective listening test, with Pearson and Spearman coefficients r=0.840 (p<0.001) and rho=0.937 (p<0.001) respectively. Further validation of the system is planned with subjective data gained from a large scale listening test. The current research focus is ‘clarity’ and defining a perceptually motivated model of it, so that it can be measured objectively.
Publications:
- Parker, A., Fenton, S., & Lee, H. (2018). Real-time System for the Measurement of Perceived Punch. In A. Andreopoulou, & B. Boren (Eds.), 145th AES Convention Proceedings: AES Convention Papers [10043] Audio Engineering Society. https://doi.org/10.17743/aesconv.2018.978-1-942220-25-1
- Parker, A., Fenton, S., & Lee, H. (2018). Development of a Real-time Punch Meter Plugin. In Proceedings of the 4th Workshop on Intelligent Music Production University of Huddersfield
Development of a perceptual model for the trade-off between interaural time and level differences for the prediction of auditory image position
test
Researcher: Nikita Goddard
Supervisor: Dr Hyunkook Lee
Project summary: For the prediction of a phantom auditory source in stereophonic audio production, it is typical to use a perceptual model for trade-off between “interchannel” time and level differences. Such a model also has been used widely in software tools for designing stereo and surround microphone arrays, including the MARRS app (Microphone Array Recording and Reproduction Simulator) developed by the Applied Psychoacoustics Lab (APL) of the UoH. However, an interchannel-based model is limited to two-channel stereo and not able to accurately predict perceived auditory image position for multichannel arrays, as recently confirmed by Goddard in her final year project (the student named for the proposed project). The ultimate way of predicting image position would be to model the “interaural” time and level difference relationship instead of the interchannel one. This is because the interaural model would not be tied to any specific loudspeaker channel configuration in contrast with the interchannel model. Therefore, the proposed project will conduct a series of listening tests to model the trade-off relationship between interaural time and level differences on perceived auditory position, and apply the result to improve the MARRS tool so that it can be used for designing multichannel microphone configurations.
Publications:
Quantifying Factors of Immersion in Virtual Reality
Researcher: Callum Eaton
Supervisor: Dr Hyunkook Lee, Braham Hughes
Project Summary: The current research project is looking to quantify how significant the impact of height reproduction is to the perception of auditory immersion in virtual reality, and aims to compare a number of common speaker arrangements to determine which is perceived to be the most immersive.
Publications:
- Eaton, C., & Lee, H. (2019) Quantifying factors of auditory immersion in virtual reality. In AES International Conference on Interactive and Immersive Audio, York.
Perception of room acoustics in virtual/augmented reality in the context of 6 degrees of freedom
Researcher: Bogdan Bacila
Supervisor: Dr Hyunkook Lee, Dr Steve Fenton
Project summary: This project aims to advance the understanding of how different auditory spatial attributes are perceived in a 6 degrees-of-freedom situation where a person can freely move in a room. Understanding these attributes will help in developing new psychoacoustic models of them, which in turn would help us develop more accurate VR/AR immersive sound experiences.
Publications:
- Bacila, B., & Lee, H. (2019). Binaural Room Impulse Response (BRIR) Database for 6DOF Spatial Perception Research. In Audio Engineering Society 146.
Investigation into the sound source dependency of elevation localisation in multichannel audio systems.
Researchers: Maksims Mironovs
Supervisor: Dr Hyunkook Lee
Project summary: Over the last decade, spatial audio systems have received an increased attention in the cinema, home and car audio. Audio quality of such systems must be of a high standard and be as close to the real environment as possible with sound localisation being one of the main criteria of the realistic spatial audio. The goal of this research is to provide the perceptually based data that can be used in the improvement of localisation accuracy of the current panning methods in spatial audio systems. Additionally, the perceptual mechanism of the vertical panning needs to be theoretically explained. To achieve this goal, subjective and objective investigations into the perceptual mechanism of 3D sound panning will be conducted. These investigations will incorporate practical loudspeaker positions and stimuli, as previous research is limited to the laboratory conditions.
Publications:
- Lee, H., Johnson, D., & Mironovs, M. (2016, May). A New Response Method for Auditory Localization and Spread Tests. In Audio Engineering Society Convention 140. Audio Engineering Society.
- Mironovs, M., & Lee, H. (2016). Vertical amplitude panning for various types of sound sources. In: Interactive Audio Systems Symposium 2016, 23rd September 2016, University of York.
- Mironovs, M., & Lee, H. (2017, May). The influence of source spectrum and loudspeaker azimuth on vertical amplitude panning. In Audio Engineering Society Convention 142. Audio Engineering Society.
- Lee, H., Johnson, D., & Mironovs, M. (2017, October). An Interactive and Intelligent Tool for Microphone Array Design. In Audio Engineering Society Convention 143. Audio Engineering Society.
- Mironovs, M., & Lee, H. (2018, May). On the Accuracy and Consistency of Sound Localization at Various Azimuth and Elevation Angles. In Audio Engineering Society Convention 144. Audio Engineering Society.
- Lee, H., Johnson, D., & Mironovs, M. (2018). Virtual Hemispherical Amplitude Panning (VHAP): A Method for 3D Panning without Elevated Loudspeakers. In Proceedings of 144th AES International Convention [Convention Paper 9965]
Investigations into the recording and reproduction methods for soundscape evaluation in virtual reality
Researcher: Connor Millns
Supervisor: Dr Hyunkook Lee
Project summary: This project aims to investigate into optimal recording and reproduction methods for the evaluation of soundscape quality in virtual reality. An extensive set of soundscape recordings made using various microphone techniques is currently being established. Descriptors for perceptual differences between different techniques will be established through a focused group elicitation and discussion experiment, and the magnitude of difference on each will be rated. The influence of 360-degree visual scene on the perception of different recording and reproduction techniques will also be investigated. Using the optimal techniques found from the study, differences between in-situ and VR lab experiments in soundscape quality evaluation will be examined.
Publications:
- Millns, C., Mironovs, M., & Lee, H. (2019). Vertical localisation accuracy of binauralised First Order Ambisonics across multiple horizontal positions. In 146th Audio Engineering Society Convention 2019 (pp. 1–7). Dublin, Republic of Ireland.
- Lee, H., & Millns, C. (2017). Microphone array impulse response (MAIR) library for spatial audio research. In 143rd Audio Engineering Society Convention 2017 (pp. 1–5). New York, USA.
- Millns, C., & Lee, H. (2018). An investigation into spatial attributes of 360° microphone techniques for virtual reality. In 144th Audio Engineering Society Convention 2018 (pp. 1–9). Milan, Italy.
New user interface design for music production
Researcher: Christopher Dewey
Supervisor: Dr Jonathan Wakefield
Project summary:
- Eliciting from users which mix parameter information they need visualised:
Existing Audio Mixing Interfaces (AMIs) have focussed primarily on track level and pan and related visualisations. This study places the user at the start of the AMI design process by reconsidering what are the most important aspects of an AMI’s visual feedback from a user’s perspective and also which parameters are most frequently used by users. An experiment was conducted with a novel AMI which in one mode provides the user with no visual feedback. This enabled the qualitative elicitation of the most desired visual feedback from test subjects. Additionally, logging user interactions enabled the quantitative analysis of time spent on different mix parameters. Results with music technology undergraduate students suggest that AMIs should concentrate on compression and EQ visualisation. - Exploring the potential of holographically projecting a data visualisation of frequency manipulated via hand gestures:
This work presents the first stage in the design and evaluation of a novel container metaphor interface for equalisation control. The current prototype system harnesses the Peppers Ghost illusion to project mid-air a holographic data visualisation of an audio track’s long-term average and real time frequency content as a deformable shape manipulated directly via hand gestures. The system uses HTML 5, JavaScript and the Web Audio API in conjunction with a Leap Motion controller and bespoke low budget projection system. During subjective evaluation users commented that the novel system was simpler and more intuitive to use than commercially established equalisation interface paradigms and most suited to creative, expressive and explorative equalisation tasks. https://youtu.be/ntPdjE9WYdc
Publications:
- Dewey, C., & Wakefield, J. (2018). Elicitation and Quantitative Analysis of User Requirements for Audio Mixing Interface. In 144th Audio Engineering Society European Convention [9935] Audio Engineering Society.
- Wakefield, J., Dewey, C., & Gale, W. (2018). Grid-based Stage Paradigm with Equalisation Extension for ‘Flat’ Mix Production. In 144th Audio Engineering Society European Convention [9930] Audio Engineering Society.
- Dewey, C., Wakefield, J., & Tindall, M. (2018). MIDI Keyboard Defined DJ Performance System. In NIME 2018: New Interfaces For Musical Expression
- Dewey, C., & Wakefield, J. (2017). Formal usability evaluation of audio track widget graphical representation for two-dimensional stage audio mixing interface. In 142nd Audio Engineering Society International Convention 2017, AES 2017 [9798] Audio Engineering Society.
- Dewey, C., & Wakefield, J. P. (2015). Evaluation of an algorithm for the automatic detection of salient frequencies in individual tracks of multi-track musical recordings. In 138th Audio Engineering Society Convention, AES 2015 (Vol. 2, pp. 1057-1061). Audio Engineering Society.
- Dewey, C., & Wakefield, J. P. (2014). A guide to the design and evaluation of new user interfaces for the audio industry. In 136th Audio Engineering Society Convention 2014 (pp. 250-259). Audio Engineering Society.
- Dewey, C., & Wakefield, J. (2013). Novel designs for the parametric peaking EQ user interface for single channel corrective EQ tasks. In 134th Audio Engineering Society Convention 2013 (pp. 453-462)
3D Audio Toolbox
Project Summary: 3D Audio Toolbox (3DAT) is an open source software package that is primarily designed for real-time rendering, simulating, analysing and developing spatial audio methods. It is able to perform both real-time and offline processing, and provides common objective analysis parameters such as Interaural Cross Correlation Coefficient (IACC), Interaural time difference (ITD) and level difference (ILD), for example. Such parameters are integrated into perceptual models for the prediction of quality attributes. The software package has been programmed using Cycling 74’s Max and, due to its open source and “sandbox” nature, allows for researchers to write and analyse their own custom algorithms.
Researcher: Dr Dale Johnson
Supervisor: Dr Hyunkook Lee
Publications:
- Johnson, D., & Lee, H. (2019). A new SOFA object collection for Max. In AES International Conference on Immersive and Interactive Audio, York.
Huddersfield Universal Listening Test Interface Generator (HULTI-GEN)
Project Summary: This engineering brief describes HULTI-GEN (Huddersfield Universal Listening Test Interface Generator), a Cycling ‘74 Max-based tool. HULTI-GEN is a user-customisable environment, which takes user-defined parameters (e.g. the number of trials, stimuli and scale settings) and automatically constructs an interface for comparing auditory stimuli, whilst also randomising the stimuli and trial order. To assist the user, templates based on ITU-R recommended methods have been included. As the recommended methods are often adjusted for different test requirements, HULTI-GEN also supports flexible editing of these presets. Furthermore, some existing techniques have been summarised within this brief, including their restrictions and how they might be altered through using HULTI-GEN. A finalised version of HULTI-GEN is to be made freely available online at: https://research.hud.ac.uk/institutes-centres/apl/resources/
Publications:
Huddersfield Acoustical Analysis Research Toolbox (HAART)
Project Summary: HAART (Huddersfield Acoustical Analysis Research Toolbox) is an open source program designed to simplify the measurement and analysis of multi-channel impulse responses (IRs). The code library is comprised of a set of objects that form a prototype program in Max. This program is able to perform the acquisition, manipulation and analysis of IRs using subjective and objective measures described in acoustics literature. HAART is also able to convolve IRs with audio material and, most importantly, able to binaurally synthesize virtual, multichannel speaker arrays over headphones, negating the need for multichannel setups when out in the field. This project was completed in 2015, and the code library is freely available from: https://research.hud.ac.uk/institutes-centres/apl/resources/
Publications:
Perceptual Optimisation of Virtual Acoustics (POVA)
Researcher: Dr Dale Johnson
Supervisor: Dr Hyunkook Lee
Abstract:
In virtual reality, it is important that the user is immersed, and that both the visual and listening experiences are pleasant and plausible. Whilst it is now possible to accurately model room acoustics using available scene geometry, the perceptual attributes may not always be optimal. Previous research has examined high level control methods over attributes, yet have only been applied to algorithmic reverberators and not geometric types, which can model the acoustics of a virtual scene more accurately. The present thesis investigates methods of perceptual control over apparent source width and tonal colouration in virtual room acoustics, and is an important step towards and intelligent optimisation method for dynamically improving the listening experience.
A review of the psychoacoustic mechanisms of spatial impression and tonal colouration was performed. Consideration was given to the effects early of reflections on these two attributes so that they can be exploited. Existing artificial reverb methods, mainly algorithmic, wave-based and geometric types, were reviewed. It was found that a geometric type was the most suitable, and so a virtual acoustics program that gave access to each reflection and their meta-data was developed. The program would allow for perceptual control methods to exploit the reflection meta-data.
Experiments were performed to find novel, directional regions to sort and group reflections by how they contribute to an attribute. The first was a region of in the horizontal plane, where any reflection arriving within it will produce maximum perceived apparent source width (ASW). Another discovered two regions of and unacceptable colouration in front of and behind the listener. Any reflection arriving within these will produce unacceptable colouration. Level adjustment of reflections within either region should manipulate the corresponding attributes, forming the basis of the control methods.
An investigation was performed where the methods were applied to binaural room impulse responses generated by the custom program in two different virtual rooms at three source-receiver distances. An elicitation test was performed to find out what perceptual differences the control methods caused using speech, guitar and orchestral sources. It was found that the largest differences were in ASW, loudness, distance and phasiness. Further investigation into the effectiveness of the control methods found that level adjustment of lateral reflections was fairly effective for controlling the degree of ASW without affecting tonal colouration. They also found that level adjustment of front-back reflections can affect ASW, yet had little effect on colouration. The final experiment compared both methods, and also investigated their effect on source loudness and distance. Again it was found that level adjustment in both regions had a significant effect on ASW yet little effect on phasiness. It was also found that they significantly affected loudness and distance. Analysis found that the changes in ASW may be linked to changes in loudness and distance.
Publications:
- Johnson, D., Lee, H. (2018). Perceptually optimised virtual acoustics. In Proceedings of the 4th workshop on intelligent music production.
- Johnson, D., Lee, H. (2017). Just noticeable difference in apparent source width depending on the direction of a single reflection. In Audio engineering society convention 142.
- Johnson, D., Lee, H. (2016a). Investigation into the perceptual effects of image source method order. In Audio engineering society convention 140.
- Johnson, D., Lee, H. (2016b). Taking advantage of geometric acoustics modeling using metadata. In Interactive audio systems symposium 2016.
Perceptual Band Allocation (PBA) for rendering vertical image spread
Researchers: Dr Hyunkook Lee, Dr Christopher Gribben, Dr Rory Wallis
Supervisor: Dr Hyunkook Lee
Project summary: This project was funded by EPSRC (EP/L019906/1). Conventional surround sound systems such as 5.1 or 7.1 are limited in that they are only able to produce a two-dimensional (2D) impression of auditory width and depth. Next generation surround sound systems that have been introduced over recent years tend to employ height channel loudspeakers in order to provide the listener with the impression of a three-dimensional (3D) soundfield. Although new methods to position (pan) the sound image in the vertical plane have been investigated, there is currently a lack of research into methods to render the perceived vertical width of the image. The vertical width rendering is particularly important for creating the impression of a fully immersive 3D ambient sound in such applications as the production of original 3D music/broadcasting content and the 3D upmixing of 2D content. This project aims to provide fundamental understandings of the perception and control of vertically oriented image width for 3D multichannel audio. Three objectives have been formulated to achieve this aim: (i) to determine the frequency-dependent perceptual resolution of interchannel decorrelation for vertical image widening; (ii) to determine the effectiveness of 'Perceptual Band Allocation (PBA)', a novel method proposed for vertical image widening; (iii) to evaluate the above two methods in real-world 2D to 3D upmixing scenarios. These objectives will be achieved through relevant signal processing techniques and subjective listening tests focussing on perceived spatial and tonal qualities. Data obtained from the listening tests will be analysed using robust statistical methods in order to model the relationship between perceptual patterns and relevant parameters. The results of this project will provide researchers and engineers with academic references for the development of new 3D audio rendering algorithms, and will ultimately enable the general public to experience a fully immersive surround sound in the home-cinema, car and mobile environments.
The key findings from this project are as follows.
- The perceptual mechanism of the so-called Pitch-Height effect for virtual auditory images has been revealed. Formal experimental data on the perceived vertical positions of octave-band filtered virtual images have been provided for different azimuth angles. It has been found that the nature of virtual source elevation localisation is significantly different from that of real source elevation localisation.
- It has been shown that the aforementioned vertical image position data can be successfully exploited for rendering different degrees of vertical image spread. This method has been tested for the 2D to 3D sound upmixing of ambient sound. The results showed that the method was subjectively preferred to other conventional methods.
- The association between the loudspeaker base angle and the perceived image elevation has been investigated in depth. It was generally shown that the perceived image is elevated from the front to above of the listener as the loudspeaker base angle increases from 0 degree to 180 degrees. It was newly found that the effect significantly depends on the spectral and temporal characteristics of the sound source. Sources with a broad and Specifically, frequency bands centred around 500Hz and 8kHz were found to have the strongest elevation effect. These findings have important implications for practical applications such as 3D sound rendering, upmixing and downmixing.
- A novel theory that ultimately explains the reason for the virtual image elevation effect has been established. Whilst the conventional theory based on the psychophysics of pinnae spectral distortion is limited to explaining the effect for high frequencies, the proposed theory is based on the brain's cognitive interpretation of ear-input signals is able to explain the effect for low frequencies also.
Publications:
- Lee, H. (2017). Sound Source and Loudspeaker Base Angle Dependency of Phantom Image Elevation Effect. AES: Journal of the Audio Engineering Society, 65(9), 733-748. https://doi.org/10.17743/jaes.2017.0028
- Lee, H., Johnson, D., & Mironovs, M. (2018). Virtual Hemispherical Amplitude Panning (VHAP): A Method for 3D Panning without Elevated Loudspeakers. In Proceedings of 144th AES International Convention [Convention Paper 9965]
- Lee, H. (2016). Perceptual band allocation (PBA) for the rendering of vertical image spread with a vertical 2D loudspeaker array. AES: Journal of the Audio Engineering Society, 64(12), 1003-1013. https://doi.org/10.17743/jaes.2016.0052
- Lee, H. (2016). Perceptually motivated 3D diffuse field upmixing. In Proceedings of the 2016 AES International Conference on Sound Field Control (Vol. 2016-July). Audio Engineering Society.
- Lee, H. (2015). 2D-to-3D ambience upmixing based on perceptual band allocation. AES: Journal of the Audio Engineering Society, 63(10), 811-821. https://doi.org/10.17743/jaes.2015.0075
- Gribben, C., & Lee, H. (2018). The Frequency and Loudspeaker-Azimuth Dependencies of Vertical Interchannel Decorrelation on the Vertical Spread of an Auditory Image. AES: Journal of the Audio Engineering Society, 66(7-8), 537-555. https://doi.org/10.17743/jaes.2018.0040
- Gribben, C., & Lee, H. (2017). A Comparison between Horizontal and Vertical Interchannel Decorrelation. Applied Sciences, 7(11), [1202]. https://doi.org/10.3390/app7111202
- Wallis, R., & Lee, H. (2017). The Reduction of Vertical Interchannel Crosstalk: The Analysis of Localisation Thresholds for Natural Sound Sources. Applied Sciences, 7(3), [278]. https://doi.org/10.3390/app7030278
- Wallis, R., & Lee, H. (2016). Vertical stereophonic localization in the presence of interchannel crosstalk: The analysis of frequency-dependent localization thresholds. AES: Journal of the Audio Engineering Society, 64(10), 762-770. https://doi.org/10.17743/jaes.2016.0039
- Wallis, R., & Lee, H. (2015). The effect of interchannel time difference on localization in vertical stereophony. AES: Journal of the Audio Engineering Society, 63(10), 767-776. https://doi.org/10.17743/jaes.2015.0069
- Lee, H., & Gribben, C. (2014). Effect of vertical microphone layer spacing for a 3D microphone array. AES: Journal of the Audio Engineering Society, 62(12), 870-884. https://doi.org/10.17743/jaes.2014.0045
The perceptual contribution of pinna related transfer function attributes in the median plane
Researcher: Jade Raine Clarke
Supervisor: Dr Hyunkook Lee
Project summary: This project carried out to investigate the perceptual effects of pinna notches in median plane sound localisation. Literature regarding sound localisation and the effects of the pinnae is outlined before a thorough description of the measurement procedure to obtain individualised HRTFs (head related transfer function) is given. HRTFs of three subjects were recorded at seven different positions in the median plane (0 ̊, 30 ̊, 60 ̊, 90 ̊, 120 ̊, 150 ̊ and 180 ̊). Two experiments were carried out using the measurements. The first consisted of reducing the magnitude of, and removing pinna related notches in the HRTFs to identify the perceptual effects of notch manipulation in both virtual reverberant and pseudo-anechoic conditions. Results for the first experiment show a great deal of variation between subjects, although it can be said that pinna notch filling is most detrimental to median plane localisation in the BRIR condition and often results in hemispheric reversals and localisation inconsistencies. The second experiment compared localisation abilities of binaurally presented sound sources in reverberant conditions to that of binaural pseudo-anechoic conditions, using real room loudspeaker localisation as a reference. Results from the latter experiment show that virtual localisation in the median plane is better in the presence of reverberation, and that subjective experience in a listening room may influence this result.
Publications:
Audio Dynamics - Towards a Perceptual Model of Punch
Researcher: Dr Steve Fenton
Supervisors: Dr Jonathan Wakefield, Dr Hyunkook Lee
Abstract:
This thesis discusses research conducted towards the development of an objective model that predicts punch in musical signals. Punch is a term often used by engineers and producers when describing a particular perceptual sensation found in produced music. Music is often characterised by listeners as being punchier yet the term is subjective, in terms of its meaning and the subsequent auditory effect on the listener. An objective model of punch would therefore prove useful for both music classification purposes and as a possible further metric that could be employed in music production and mastering metering tools. The literature reviewed within this body of work encompasses both subjective and objective audio evaluation methods in addition to low-level signal extraction and measurement techniques. The review concludes that whilst there has been a great deal of work in the area of semantic description and audio quality measurement, low-level analysis with respect to the perception of punch remains largely unexplored. The project was completed in a number of phases each designed to investigate the perceptual effects resulting from manipulation of test stimuli. The rationale behind this testing was to establish the key low-level descriptors relating to the punch attribute with the aim of producing a final objective and perceptually based model. The listening tests in each phase were conducted according to the ITU-R BS 1534-1 recommendation. In producing an objective model for the prediction of punch, listener perception to the attribute shows a strong correlation to the signal onset times, octave frequency band, signal duration and dynamic range. The punch measure obtained using the model is named PM95, where 95 indicates the upper percentile used in the measurement. Secondary measures were also obtained as a result of the iterative approach adopted. These are Inter-Band-Ratio (IBR), Transient to Steady-state Ratio (TSR) and Transient to Steady-state Ratio+Residual (TSR+R). These measures are useful in quantifying overall audio quality with respect to its dynamic range across frequency bands in addition to being a more reliable metric for defining the overall compression being applied to a piece of music. In addition, the latter two measures proposed may be useful in highlighting perceptual masking artefacts. The completed perceptual punch model was validated using the scores obtained from a large scale and independently conducted forced pairwise comparison test using expert listeners and a wide range of musical stimuli. From the results obtained, the PM95 measure showed a ‘very strong’ positive correlation with listener punch perception. Both r and rho coefficients (0.849 and 0.833) being significant at the 0.01 level (2-tailed). The PM95M measure, which is the PM95 measure divided by the mean value of punch frames also correlated very well with the perceptual punch scale having both r and rho coefficients (0.707 and -0.750) being significant at the 0.05 level (2-tailed). A real-time implementation of the punch model (and other measures proposed in this thesis) could be utilised as extensions to the metrics currently being used in Music Information Retrieval.
Publications:
- Fenton, S., Lee, H., & Wakefield, J. (2016). Evaluation of a perceptually based model of 'Punch' with music material. In 141st Audio Engineering Society International Convention 2016, AES 2016 Audio Engineering Society.
- Fenton, S., Lee, H., & Wakefield, J. (2015). Hybrid multiresolution analysis of 'punch' in musical signals. In 138th Audio Engineering Society Convention, AES 2015 (Vol. 1, pp. 79-88). Audio Engineering Society.
- Fenton, S., & Lee, H. (2015). Towards a Perceptual Model Of 'Punch' In Musical Signals. In 139th Audio Engineering Society International Convention, AES 2015 Audio Engineering Society.
- Fenton, S., Lee, H., & Wakefield, J. (2014). Elicitation and objective grading of 'Punch' within produced music. In 136th Audio Engineering Society Convention 2014 (pp. 189-196). Audio Engineering Society.
- Fenton, S., & Wakefield, J. (2012). Objective profiling of perceived punch and clarity in produced music. In 132nd Audio Engineering Society Convention 2012 (pp. 587-601)
- Fenton, S., Fazenda, B., & Wakefield, J. (2011). Objective measurement of music quality using Inter-Band Relationship analysis. In 130th Audio Engineering Society Convention 2011 (Vol. 2, pp. 760-769)
- Fenton, S. M., Fazenda, B. M., & Wakefield, J. P. (2009). Objective quality measurement of audio using multiband dynamic range analysis. In 25th Reproduced Sound Conference 2009, REPRODUCED SOUND 2009: The Audio Explosion - Proceedings of the Institute of Acoustics (Vol. 31, pp. 6-18)
Investigations into the perception of vertical interchannel decorrelation in 3D surround sound reproduction
Researcher: Dr Christopher Gribben
Supervisor: Dr Hyunkook Lee
Project summary: The use of three-dimensional (3D) surround sound systems has seen a rapid increase over recent years. In two-dimensional (2D) loudspeaker formats (i.e. two-channel stereophony (stereo) and 5.1 Surround), horizontal interchannel decorrelation is a well-established technique for controlling the horizontal spread of a phantom image. Use of interchannel decorrelation can also be found within established two-to-five channel upmixing methods (stereo to 5.1). More recently, proprietary algorithms have been developed that perform 2D-to-3D upmixing, which presumably make use of interchannel decorrelation as well; however, it is not currently known how interchannel decorrelation is perceived in the vertical domain. From this, it is considered that formal investigations into the perception of vertical interchannel decorrelation are necessary. Findings from such experiments may contribute to the improved control of a sound source within 3D surround systems (i.e. the vertical spread), in addition to aiding the optimisation of 2D-to-3D upmixing algorithms.
The current thesis presents a series of experiments that systematically assess vertical interchannel decorrelation under various conditions. Firstly, a comparison is made between horizontal and vertical interchannel decorrelation, where it is found that vertical decorrelation is weaker than horizontal decorrelation. However, it is also seen that vertical decorrelation can generate a significant increase of vertical image spread (VIS) for some conditions. Following this, vertical decorrelation is assessed for octave-band pink noise stimuli at various azimuth angles to the listener. The results demonstrate that vertical decorrelation is dependent on both frequency and presentation angle – a general relationship between the interchannel cross-correlation (ICC) and VIS is observed for the 500 Hz octave-band and above, and strongest for the 8 kHz octave-band. Objective analysis of these stimuli signals determined that spectral changes at higher frequencies appear to be associated with VIS perception – at 0° azimuth, the 8 and 16 kHz octave-bands demonstrate potential spectral cues, at ±30°, similar cues are seen in the 4, 8 and 16 kHz bands, and from ±110°, cues are featured in the 2, 4, 8 and 16 kHz bands. In the case of the 8 kHz octave-band, it seems that vertical decorrelation causes a ‘filling in’ of vertical localisation notch cues, potentially resulting in ambiguous perception of vertical extent. In contrast, the objective analysis suggests that VIS perception of the 500 Hz and 1 kHz bands may have been related to early reflections in the listening room.
From the experiments above, it is demonstrated that the perception of VIS from vertical inter- channel decorrelation is frequency-dependent, with high frequencies playing a particularly important role. A following experiment explores the vertical decorrelation of high frequencies only, where it is seen that decorrelation of the 500 Hz octave-band and above produces a similar perception of VIS to broadband decorrelation, whilst improving tonal quality. The results also indicate that decorrelation of the 8 kHz octave-band and above alone can significantly increase VIS, provided the source signal has sufficient high frequency energy. The final experimental chapter of the present thesis aims to provide a controlled assessment of 2D-to-3D upmixing, taking into account the findings of the previous experiments. In general, 2D-to-3D upmixing by vertical interchannel decorrelation had little impact on listener envelopment (LEV), when compared against a level-matched 2D 5.1 reference. Furthermore, amplitude-based decorrelation appeared to be marginally more effective, and ‘high-pass decorrelation’ resulted in slightly better tonal quality for sources that featured greater low frequency energy.
Publications:
- Gribben, C., & Lee, H. (2018). Increasing the Vertical Image Spread of Natural Sound Sources using Band-Limited Interchannel Decorrelation, AES International Conference on Immersive and Interactive Audio, York.
- Gribben, C., & Lee, H. (2018). The Frequency and Loudspeaker-Azimuth Dependencies of Vertical Interchannel Decorrelation on the Vertical Spread of an Auditory Image. AES: Journal of the Audio Engineering Society, 66(7-8), 537-555. https://doi.org/10.17743/jaes.2018.0040
- Gribben, C., & Lee, H. (2017). A Comparison between Horizontal and Vertical Interchannel Decorrelation. Applied Sciences, 7(11), [1202]. https://doi.org/10.3390/app7111202
The analysis of frequency dependent localisation thresholds and the perceptual effects of vertical interchannel crosstalk
Researcher: Dr Rory Wallis
Supervisor: Dr Hyunkook Lee
Project summary: In the context of microphone techniques for recording three-dimensional (3D) sound in an acoustic space, vertical interchannel crosstalk occurs when the height layer of microphones capture excessive direct sound. This effect can cause sound images to be formed as vertically oriented phantom images, at positions intermediate between the main and height layer of loudspeakers, as opposed to at the desired position of the main layer. Additional spatial and timbral effects will also be perceived, although these have not been examined in the literature.
Previous research has examined the minimum amount of attenuation of direct sound in the height layer necessary to prevent vertical interchannel crosstalk from affecting the perceived location of the main channel signal, which has become known as the ‘localisation threshold’. However, existing methods of applying this have not considered the frequency dependency of median plane localisation. The present thesis therefore examined if localisation thresholds could be applied through the frequency dependent manipulation of the direct sound in the height layer (band reduction), as well as the most salient perceptual effects of vertical interchannel crosstalk. The operation of the precedence effect in the median plane was also considered.
A review of human localisation mechanisms was first conducted, with a particular focus on how such characteristics might be able to be exploited for the development of a band reduction method. Additionally, consideration was also given to how secondary vertical sources might affect direct sounds, in order to gain further understanding of what the most salient effects of vertical interchannel crosstalk might be.
The frequency dependency of localisation thresholds was considered in anechoic conditions, with subsequent localisation experiments being conducted to assist in explaining the results. Following this, localisation thresholds using blanket reduction (attenuation of the direct sound in the height layer evenly across the spectrum) were analysed. The frequency dependency of localisation threshold was subsequently examined in a natural listening environment, with a series of band reduction methods being developed based on the results. The band and blanket reduction thresholds were then verified in localisation tests. The final experiment considered the most salient effects of vertical interchannel crosstalk, how these were affected when the different localisation threshold methods were applied and which was the most preferred method by subjects.
The results showed that localisation thresholds are frequency dependent in both anechoic and natural listening environments. In particular, more level reduction was necessary for the mid-high frequencies compared to low frequencies. Additionally, a series of different band reduction methods were found to be effective. Elicitation experiments showed that the most salient effects of vertical interchannel crosstalk were increases in vertical image spread, source elevation, loudness and fullness, with the perception of these when the localisation threshold was applied being dependent on the method being used. Moreover, although subjective preference could not discriminate between the methods tested, the presence of direct sound in the height layer was consistently preferred compared to situations where it was absent. Furthermore, no evidence was found to support the existence of either the precedence effect or localisation dominance in the median plane.
Publications:
- Wallis, R., & Lee, H. (2017). The Reduction of Vertical Interchannel Crosstalk: The Analysis of Localisation Thresholds for Natural Sound Sources. Applied Sciences, 7(3), [278]. https://doi.org/10.3390/app7030278
- Wallis, R., & Lee, H. (2016). Vertical stereophonic localization in the presence of interchannel crosstalk: The analysis of frequency-dependent localization thresholds. AES: Journal of the Audio Engineering Society, 64(10), 762-770. https://doi.org/10.17743/jaes.2016.0039
- Wallis, R., & Lee, H. (2015). The effect of interchannel time difference on localization in vertical stereophony. AES: Journal of the Audio Engineering Society, 63(10), 767-776. https://doi.org/10.17743/jaes.2015.0069
An Investigation into Non-Linear Sonic Signatures with a Focus on Dynamic Range Compression and the 1176 Fet Compressor
Researcher: Dr Austin Moore
Supervisors: Prof Rupert Till, Dr Jonathan Wakefield
Abstract:
Dynamic range compression (DRC) is a common process in music production. Traditionally used to control the dynamic range of signals and reduce the risk of overloading recording devices, over time it has developed into a creative colouration effect rather than a preventative measure. This thesis investigates sonic signatures, distortion, non-linearity and how audio
material is coloured during the music production process. It explores how methodologies used to measure distortion and timbre can be used to define the sonic signature of hardware compressors and other pieces of music production equipment. A grounded theory and content analysis study was carried out to explore how producers use DRC in their work, how they describe its sound quality, which compressors they frequently use and which audio sources they process with particular types of compressor. The results from this qualitative study reveal that producers use compressors to manipulate the timbre of program material and select specific compressors with particular settings for colouration effects. Tests were carried out on a number of popular vintage hardware compressors to assess their sonic signature. Firstly, a comparative study was conducted on the Teletronix LA2A, Fairchild 670, Urei 1176 and dbx165A. Secondly a comprehensive in-depth analysis was undertaken of the 1176 to fully catalogue its sonic signature over a range of settings and to compare results from a vintage Urei Blackface 1176 and a modern Universal Audio reissue. Objective analysis was conducted on the compressors using Total Harmonic Distortion (THD), Intermodulation Distortion (IMD) and tone burst measurements. Complex program material was analysed using spectrum analysis, critical listening and audio feature extraction. It was found the compressors all have subtle nuances to their sonic signature as a result of elements in their design colouring the audio with non-linear artefacts. The 1176 was shown to impart significant amounts of distortion when used in its all-buttons mode and with fast attack and release configurations. This style of processing was favoured by producers in the qualitative study.
Publications:
- Moore, A. (Accepted/In press). Tracking with Processing and Coloring as You Go. In R. Hepworth-Sawyer, J. Hodgson, & M. Marrington (Eds.), Producing Music (Perspectives on Music Production). Routledge Taylor & Francis Group.
- Moore, A., & Wakefield, J. (2017). An Investigation into the Relationship between the Subjective Descriptor Aggressive and the Universal Audio of the 1176 FET Compressor. In 142nd Audio Engineering Society International Convention 2017, AES 2017 [9749] Audio Engineering Society.
- Moore, A., Till, R., & Wakefield, J. P. (2016). An investigation into the sonic signature of three classic dynamic range compressors. In 140th Audio Engineering Society International Convention 2016, AES 2016 Audio Engineering Society.
The effects of a vertical reflection on the relationship between listener preference and timbral and spatial attributes
Researcher: Tom Robotham
Supervisors: Dr Matthew Stephenson, Dr Hyunkook Lee
Project summary: Early reflections play a large role in our perception of sound and as such, have been subject to various treatments over the years due to changing tastes and room requirements. Whilst there is research into these early reflections, arriving both vertically and horizontally in small rooms regarding critical listening, little research has been conducted regarding the beneficial or detrimental impact of early vertical reflections on listener preference, in the context of listening for entertainment. Two experiments were conducted through subjective testing in a semi-anechoic chamber and listening room in order to assess subjects’ preference of playback of a direct sound against playback with the addition of the first geometrical vertical reflection. Program material remained constant in both experiments, employing five musical and one speech stimuli. The first experiment used a paired comparison method assessing a subjects’ preference, and perceived magnitude of timbral and spatial difference provided by a frequency independent ceiling reflection. Each comparison was followed by a free verbalisation task for subjects to describe the perceived change(s). The second experiment investigated this further by focusing specifically on subjects’ preference with a frequency dependent reflection. A more controlled verbalisation task provided a list of descriptive terms which the subject’s used to describe which attribute(s) influenced their preference. The results show that preference for playback with the inclusion of a vertical reflection was highly varied across both subjects and samples. However both experiments suggest that the main perceptual attribute with which subject’s based their preference was timbre, common spatial attributes (image shift/spread) cannot be used to predict preference. Experiment two suggests that the alteration of the frequency content of a vertical reflection, may also provide a more consistent level of preference for certain stimuli. It is also shown that while certain attributes occur frequently (brilliance/fullness) for describing preference, others less frequently used (nasal/boxy), may influence preference to a greater extent.
Publications:
Contemporary Metal Music Production
Researcher: Dr Mark Mynett
Supervisor: Prof Rupert Till
Abstract:
Distinct challenges are posed when conveying Contemporary Metal Music’s(CMM) sounds and performance perspectives within a recorded and mixed form. CMM often features down tuned, heavily distorted timbres, alongside high tempi, fast and frequently complex subdivisions, and highly synchronised instrumentation. The combination of these elements results in a significant concentration of dense musical sound usually referred to as ‘heaviness’. The publications for this thesis present approaches, processes and techniques for capturing, presenting and accentuating heaviness, as well as intelligibility and performance precision which facilitate the listener’s clear comprehension of the frequent overarching complexity in the music’s construction. Intelligibility and performance precision are the principal requirements for a high commercial standard of CMM, and additionally can enhance a production’s sense of heaviness
This synoptic commentary defines heaviness from an ecological perspective, by highlighting invariant properties that shape the embodied experience of being human. Heaviness is primarily substantiated through displays of distortion and, regardless of the listening levels involved, the fundamentals of this identity are ecologically linked to volume, power, energy, intensity, emotionality and aggression. In addition to distortion, a vital component of heaviness is sonic weight, which refers to CMM’s low frequencies being associated with large, intense and powerful entities.
CMM’s heaviness is also considered in terms of the perceived proximity of activity, apparent size of performance environment, and level and type of energy being expended. In particular, CMM provides the listener with the sense of utmost proximity to the band, usually without any significant perspective of depth.
Production strategies for achieving a high commercial standard in CMM are then presented. This is followed by a reflective commentary on the portfolio of productions, which includes discussion of the author’s transition from emulative to professional level of production and considers originality within this body of work.
By presenting the subject as an important, valid and authentic scholarly discipline, this work bridges the gap between the worlds of academia and music production practice for this style.
Publications:
- Mynett, M. (2017). Metal Music Manual: Producing, Engineering, Mixing and Mastering Contemporary Heavy Music. Routledge Taylor & Francis Group.
- Mynett, M. (2016). The distortion paradox: Analyzing contemporary metal production. In Global Metal Music and Culture: Current Directions in Metal Studies (pp. 68-88). Taylor and Francis Inc.
- Mynett, M. (2012). Achieving Intelligibility whilst Maintaining Heaviness when Producing Contemporary Metal Music. Journal on the Art of Record Production, (6).
- Mynett, M. (2011). Sound at Source: The Creative Practice of Re-Heading, Dampening and Drum Tuning for the Contemporary Metal Genre. Journal on the Art of Record Production, (5).
- Mynett, M., Wakefield, J., & Till, R. (2010). Intelligent Equalisation Principles and Techniques for Minimising Masking when Mixing the Extreme Modern Metal Genre. In K. Spracklen, & R. Hill (Eds.), Heavy Fundamentalisms: Music, Metal and Politics (pp. 141-146). Inter-Disciplinary Press.
The effect of sound source and reflection angle on the perception of echo thresholds
Researcher: Lee Davis
Supervisor: Dr Hyunkook Lee
Project summary: This paper looks into comparisons of time differences recorded for echo thresholds under differing stimuli, angles and listener instructions. Previous research has focused on echo thresholds primarily with regards to level di↵erence or a limited combination of these variables. Contrasting listener instructions between research such as “echo barely audible” and “echo clearly audible” has been shown to produce different thresholds. The former instruction resulted in summing localisation being considered and lower thresholds.
Listeners manipulated sliders on a GUI to reduce the time difference between two randomly selected loudspeakers. Two tests were undertaken to grade the beginning of separation with fusion still evident and complete separation. Orchestral, pink noise burst and speech stimuli were used as continuous, transient and familiar sources espectively. 17 loudspeakers angles were available in total, however a single angle per side of the median plane was chosen randomly by the GUI which produced 10 angles per test. The listener sat in the centre of the room with speakers radiating around them at 0, ±30, ±60, ±90, ±120, ±150 and 180 degree azimuthal intervals at 0! elevation and 0, ±30 and ±110 degrees at 30 degree elevation, replicating common multichannel surround setups. The lead sound was presented from the speaker directly in front of the listener at 0! azimuth and 0! Elevation.
A Paired-Samples Sign Test was used for significance testing of median di↵erences between graded echo thresholds. There were clear median di↵erences between tests when the marking criteria was different. The orchestral stimulus was overall significantly different to the pink noise and speech stimuli in the fusion test. There were significant differences for half of the angles (those within the median plane or relatively behind the listener position) for the orchestral and pink noise comparison in the separation test. Significant differences were apparent for the majority of angles in the separation test between the orchestral and speech stimuli. For both tests, the pink noise and speech comparison showed no significant differences. Limited significant differences were noted between angles. Median plane angles for the lag sound showed increased echo thresholds.
Publications:
An investigation into the changes of loudness perception in relation to changes in crest factor
Researcher: Mark Wendl
Supervisor: Dr Hyunkook Lee
Project summary: Even though modern technology has the capability to make audio sound better than it ever has, there is still an ongoing argument about the compression levels with music. Heavy compression has been found in previous studies to be detrimental to quality but before a full understanding into how compression affects audio, a fundamental knowledge of the affects across different frequencies is important. Using pink noise in a series of tests to determine the level of compression and how it changes the perceived loudness across different frequency bands and across different amplitude characteristics gave an insight into how compression affects complex audio. It was found that lower octaves behaved differently to the expected outcome where it was found to be perceived as louder despite no extra treatment from other frequency bands. It was however also found that all frequency bands increased in loudness as a result of compression.
Publications:
- Wendl, M., & Lee, H. (2015). The effect of dynamic range compression on perceived loudness for octave bands of pink noise in relation to crest factor. In 138th Audio Engineering Society Convention, AES 2015 (Vol. 1, pp. 502-510). Audio Engineering Society.
- Wendl, M., & Lee, H. (2014). The effect of dynamic range compression on loudness and quality perception in relation to crest factor. In 136th Audio Engineering Society Convention 2014 (pp. 86-93). Audio Engineering Society.