Below is a list of research projects currently ongoing or completed in the CAPE. Click on each project for more details.
Capturing and rendering audio for 360° virtual reality
Capturing and rendering audio for 3D immersive reproduction
CityTones: a Repository of Crowdsourced Annotated Soundfield Soundscapes
Phantom image elevation effect & Virtual Hemispherical Amplitude Panning (VHAP)
Towards a framework for the objective measurement of perceptual audio attributes
Development of a perceptual model for the trade-off between interaural time and level differences for the prediction of auditory image position
Quantifying Factors of Immersion in Virtual Reality
Perception of room acoustics in virtual/augmented reality in the context of 6 degrees of freedom
Investigation into the sound source dependency of elevation localisation in multichannel audio systems
Investigations into the recording and reproduction methods for soundscape evaluation in virtual reality
New user interface design for music production
3D Audio Toolbox (3DAT)
Huddersfield Universal Listening Test Interface Generator (HULTI-GEN)
Huddersfield Acoustical Analysis Research Toolbox (HAART)
Perceptual Optimisation of Virtual Acoustics (POVA)
Perceptual Band Allocation (PBA) for rendering vertical image spread
The perceptual contribution of pinna related transfer function attributes in the median plane
Audio Dynamics - Towards a Perceptual Model of Punch
Investigations into the perception of vertical interchannel decorrelation in 3D surround sound reproduction
The analysis of frequency dependent localisation thresholds and the perceptual effects of vertical interchannel crosstalk
An Investigation into Non-Linear Sonic Signatures with a Focus on Dynamic Range Compression and the 1176 Fet Compressor
The effects of a vertical reflection on the relationship between listener preference and timbral and spatial attributes
Contemporary Metal Music Production
The effect of sound source and reflection angle on the perception of echo thresholds
An investigation into the changes of loudness perception in relation to changes in crest factor for octave bands
Researchers: Dr Hyunkook Lee, Connor Millns
Supervisor: Dr Hyunkook Lee
Project summary: This project investigates into the recording and reproduction methods for 360° audio for virtual reality applications. Currently, the most popular method for capturing 360° audio for VR is arguably the first order Ambisonics (FOA). FOA microphone systems are typically compact in size, thus convenient for location recording, and offers a stable localization characteristic and a flexible sound field rotation functionality. However, FOA has limitations in terms of perceived spaciousness and the size of sweet spot in loudspeaker reproduction due to the high level of interchannel correlation. On the other hand, a near-coincident microphone array, which incorporates directional microphones that are spaced and angled outwards, can provide a greater balance between spaciousness and localizability than a pure coincident array. The current project investigates into the localisation accuracy and spatial attributes of Equal Segment Microphone Array (ESMA) for music and urban soundscape VR applications in both visual and non-visual conditions. Also, different recording and reproduction techniques are perceptually evaluated in terms of their low-level spatial attributes. The optimal use scenarios for different techniques are determined depending on sound source, acoustic conditions and environmental context. Below is the summary of some of the key findings so far.
Researchers: Dr Hyunkook Lee, Dr Christopher Gribben, Dr Rory Wallis, Connor Millns
Supervisor: Dr Hyunkook Lee
Project summary: The recently proposed multichannel audio formats such as Dolby Atmos, Auro-3D and NHK 22.2 employ height channels to provide the auditory sensation of a “three-dimensional (3D)” space. This project, funded by EPSRC (EP/L019906/1), aims to provide fundamental psychoacoustic principles for the perception, recording and reproduction of height dimension in 3D reproduction. Below is the summary of some of the main findings and outcomes from this project so far.
Researchers: Prof Agnieszka Roginska (NYU), Dr Hyunkook Lee, Ana Elisa Mendez Mendez (NYU), Scott Murakami (NYU), Andrea Genovese (NYU)
Supervisors: Prof Agnieszka Roginska, Dr Hyunkook Lee
Project summary: The CityTones project is a collaborative open-source repository initiated and administered by New York University (NYU) Steinhardt and the University of Huddersfield. CityTones invites sound recordists around the world to contribute to the repository using 360-degree audio and visual capture methods. The database includes descriptors containing information about the technical details of the recording, physical information, subjective quality attributes, and sound content information. The recordings are verified after submission and made available in the public database. The database will be publicly available for users to download. Applications include the simulation of environments, sound design, research areas such as audio engineering, human computer interaction and machine listening. The data and recordings can be used to study immersive recording techniques. The data with crowdsourced annotations can be used in machine listening research to train models for sound source identification.
The microphone system to be used for audio recording must be compatible for 360° audio rendering in the 1st order B-format. If a Higher Order Ambisonics (HOA) system is used, the recording must be converted into the 1st order B-format for submission. A multichannel spaced microphone array designed for 360° audio capture can also be used (e.g., ESMA-3D, Schoeps ORTF-3D, etc.). Signals captured by such an array must be encoded in the 1st order B-format. The audio recordings should recorded digitally in the PCM wave format at a sampling rate of 48 kHz, with a bit depth of 24 bits. A spherical visual recording or at least a panoramic picture must accompany the audio recording in order to provide all-around visual information about the recording location. The duration of the recordings must be a minimum of 3 minutes in length, with no specified maximum time limit. A length of about 5 minutes for each recording is recommended.
The audio/video recordings for CityTones will be submitted through a portal on the NYU Immersive Audio Group website https://wp.nyu.edu/immersiveaudiogroup/citytones/. The submission process involves a Google survey and submission of the recording. Through the survey, submitters provide descriptive information including physical and technical details, and subjective quality attributes.
Researcher: Dr Hyunkook Lee, Dr Dale Johnson and Maksims Mironovs
Supervisor: Dr Hyunkook Lee
Project summary: Early studies reported that, when two identical signals are simultaneously reproduced from a pair of loudspeakers that are placed at ear level and arranged symmetrically from the listener position, the resulting phantom centre image would be perceived to be elevated in the median plane. It was also confirmed in the studies that the degree of perceived elevation would increase as the loudspeaker base angle increased from 0° to 180°; the image would be perceived almost right above the listener’s head when the base angle is 180°.
This project investigates into this psychoacoustic effect further, providing more systematic subjective data and theoretical explanations, and also develops a new virtual 3D panning method called VHAP (virtual hemispherical amplitude panning) based on the effect. The main findings and outcomes so far as follows:
Researcher: Andrew Parker
Supervisors: Dr Steve Fenton, Dr Hyunkook Lee
Project summary: A real-time system for the objective measurement of perceived ‘punch’ in a music signal has been developed based on previous work. The system’s output shows ‘strong’ correlation with perceptual scores obtained through subjective listening test, with Pearson and Spearman coefficients r=0.840 (p<0.001) and rho=0.937 (p<0.001) respectively. Further validation of the system is planned with subjective data gained from a large scale listening test. The current research focus is ‘clarity’ and defining a perceptually motivated model of it, so that it can be measured objectively.
Researcher: Nikita Goddard
Supervisor: Dr Hyunkook Lee
Project summary: For the prediction of a phantom auditory source in stereophonic audio production, it is typical to use a perceptual model for trade-off between “interchannel” time and level differences. Such a model also has been used widely in software tools for designing stereo and surround microphone arrays, including the MARRS app (Microphone Array Recording and Reproduction Simulator) developed by the Applied Psychoacoustics Lab (APL) of the UoH. However, an interchannel-based model is limited to two-channel stereo and not able to accurately predict perceived auditory image position for multichannel arrays, as recently confirmed by Goddard in her final year project (the student named for the proposed project). The ultimate way of predicting image position would be to model the “interaural” time and level difference relationship instead of the interchannel one. This is because the interaural model would not be tied to any specific loudspeaker channel configuration in contrast with the interchannel model. Therefore, the proposed project will conduct a series of listening tests to model the trade-off relationship between interaural time and level differences on perceived auditory position, and apply the result to improve the MARRS tool so that it can be used for designing multichannel microphone configurations.
Researcher: Callum Eaton
Supervisor: Dr Hyunkook Lee, Braham Hughes
Project Summary: The current research project is looking to quantify how significant the impact of height reproduction is to the perception of auditory immersion in virtual reality, and aims to compare a number of common speaker arrangements to determine which is perceived to be the most immersive.
Researcher: Bogdan Bacila
Supervisor: Dr Hyunkook Lee, Dr Steve Fenton
Project summary: This project aims to advance the understanding of how different auditory spatial attributes are perceived in a 6 degrees-of-freedom situation where a person can freely move in a room. Understanding these attributes will help in developing new psychoacoustic models of them, which in turn would help us develop more accurate VR/AR immersive sound experiences.
Researchers: Maksims Mironovs
Supervisor: Dr Hyunkook Lee
Project summary: Over the last decade, spatial audio systems have received an increased attention in the cinema, home and car audio. Audio quality of such systems must be of a high standard and be as close to the real environment as possible with sound localisation being one of the main criteria of the realistic spatial audio. The goal of this research is to provide the perceptually based data that can be used in the improvement of localisation accuracy of the current panning methods in spatial audio systems. Additionally, the perceptual mechanism of the vertical panning needs to be theoretically explained. To achieve this goal, subjective and objective investigations into the perceptual mechanism of 3D sound panning will be conducted. These investigations will incorporate practical loudspeaker positions and stimuli, as previous research is limited to the laboratory conditions.
Researcher: Connor Millns
Supervisor: Dr Hyunkook Lee
Project summary: This project aims to investigate into optimal recording and reproduction methods for the evaluation of soundscape quality in virtual reality. An extensive set of soundscape recordings made using various microphone techniques is currently being established. Descriptors for perceptual differences between different techniques will be established through a focused group elicitation and discussion experiment, and the magnitude of difference on each will be rated. The influence of 360-degree visual scene on the perception of different recording and reproduction techniques will also be investigated. Using the optimal techniques found from the study, differences between in-situ and VR lab experiments in soundscape quality evaluation will be examined.
Researcher: Christopher Dewey
Supervisor: Dr Jonathan Wakefield
Project Summary: 3D Audio Toolbox (3DAT) is an open source software package that is primarily designed for real-time rendering, simulating, analysing and developing spatial audio methods. It is able to perform both real-time and offline processing, and provides common objective analysis parameters such as Interaural Cross Correlation Coefficient (IACC), Interaural time difference (ITD) and level difference (ILD), for example. Such parameters are integrated into perceptual models for the prediction of quality attributes. The software package has been programmed using Cycling 74’s Max and, due to its open source and “sandbox” nature, allows for researchers to write and analyse their own custom algorithms.
Researcher: Dr Dale Johnson
Supervisor: Dr Hyunkook Lee
Project Summary: This engineering brief describes HULTI-GEN (Huddersfield Universal Listening Test Interface Generator), a Cycling ‘74 Max-based tool. HULTI-GEN is a user-customisable environment, which takes user-defined parameters (e.g. the number of trials, stimuli and scale settings) and automatically constructs an interface for comparing auditory stimuli, whilst also randomising the stimuli and trial order. To assist the user, templates based on ITU-R recommended methods have been included. As the recommended methods are often adjusted for different test requirements, HULTI-GEN also supports flexible editing of these presets. Furthermore, some existing techniques have been summarised within this brief, including their restrictions and how they might be altered through using HULTI-GEN. A finalised version of HULTI-GEN is to be made freely available online at: https://research.hud.ac.uk/institutes-centres/apl/resources/
Project Summary: HAART (Huddersfield Acoustical Analysis Research Toolbox) is an open source program designed to simplify the measurement and analysis of multi-channel impulse responses (IRs). The code library is comprised of a set of objects that form a prototype program in Max. This program is able to perform the acquisition, manipulation and analysis of IRs using subjective and objective measures described in acoustics literature. HAART is also able to convolve IRs with audio material and, most importantly, able to binaurally synthesize virtual, multichannel speaker arrays over headphones, negating the need for multichannel setups when out in the field. This project was completed in 2015, and the code library is freely available from: https://research.hud.ac.uk/institutes-centres/apl/resources/
Researcher: Dr Dale Johnson
Supervisor: Dr Hyunkook Lee
In virtual reality, it is important that the user is immersed, and that both the visual and listening experiences are pleasant and plausible. Whilst it is now possible to accurately model room acoustics using available scene geometry, the perceptual attributes may not always be optimal. Previous research has examined high level control methods over attributes, yet have only been applied to algorithmic reverberators and not geometric types, which can model the acoustics of a virtual scene more accurately. The present thesis investigates methods of perceptual control over apparent source width and tonal colouration in virtual room acoustics, and is an important step towards and intelligent optimisation method for dynamically improving the listening experience.
A review of the psychoacoustic mechanisms of spatial impression and tonal colouration was performed. Consideration was given to the effects early of reflections on these two attributes so that they can be exploited. Existing artificial reverb methods, mainly algorithmic, wave-based and geometric types, were reviewed. It was found that a geometric type was the most suitable, and so a virtual acoustics program that gave access to each reflection and their meta-data was developed. The program would allow for perceptual control methods to exploit the reflection meta-data.
Experiments were performed to find novel, directional regions to sort and group reflections by how they contribute to an attribute. The first was a region of in the horizontal plane, where any reflection arriving within it will produce maximum perceived apparent source width (ASW). Another discovered two regions of and unacceptable colouration in front of and behind the listener. Any reflection arriving within these will produce unacceptable colouration. Level adjustment of reflections within either region should manipulate the corresponding attributes, forming the basis of the control methods.
An investigation was performed where the methods were applied to binaural room impulse responses generated by the custom program in two different virtual rooms at three source-receiver distances. An elicitation test was performed to find out what perceptual differences the control methods caused using speech, guitar and orchestral sources. It was found that the largest differences were in ASW, loudness, distance and phasiness. Further investigation into the effectiveness of the control methods found that level adjustment of lateral reflections was fairly effective for controlling the degree of ASW without affecting tonal colouration. They also found that level adjustment of front-back reflections can affect ASW, yet had little effect on colouration. The final experiment compared both methods, and also investigated their effect on source loudness and distance. Again it was found that level adjustment in both regions had a significant effect on ASW yet little effect on phasiness. It was also found that they significantly affected loudness and distance. Analysis found that the changes in ASW may be linked to changes in loudness and distance.
Researchers: Dr Hyunkook Lee, Dr Christopher Gribben, Dr Rory Wallis
Supervisor: Dr Hyunkook Lee
Project summary: This project was funded by EPSRC (EP/L019906/1). Conventional surround sound systems such as 5.1 or 7.1 are limited in that they are only able to produce a two-dimensional (2D) impression of auditory width and depth. Next generation surround sound systems that have been introduced over recent years tend to employ height channel loudspeakers in order to provide the listener with the impression of a three-dimensional (3D) soundfield. Although new methods to position (pan) the sound image in the vertical plane have been investigated, there is currently a lack of research into methods to render the perceived vertical width of the image. The vertical width rendering is particularly important for creating the impression of a fully immersive 3D ambient sound in such applications as the production of original 3D music/broadcasting content and the 3D upmixing of 2D content. This project aims to provide fundamental understandings of the perception and control of vertically oriented image width for 3D multichannel audio. Three objectives have been formulated to achieve this aim: (i) to determine the frequency-dependent perceptual resolution of interchannel decorrelation for vertical image widening; (ii) to determine the effectiveness of 'Perceptual Band Allocation (PBA)', a novel method proposed for vertical image widening; (iii) to evaluate the above two methods in real-world 2D to 3D upmixing scenarios. These objectives will be achieved through relevant signal processing techniques and subjective listening tests focussing on perceived spatial and tonal qualities. Data obtained from the listening tests will be analysed using robust statistical methods in order to model the relationship between perceptual patterns and relevant parameters. The results of this project will provide researchers and engineers with academic references for the development of new 3D audio rendering algorithms, and will ultimately enable the general public to experience a fully immersive surround sound in the home-cinema, car and mobile environments.
The key findings from this project are as follows.
Researcher: Jade Raine Clarke
Supervisor: Dr Hyunkook Lee
Project summary: This project carried out to investigate the perceptual effects of pinna notches in median plane sound localisation. Literature regarding sound localisation and the effects of the pinnae is outlined before a thorough description of the measurement procedure to obtain individualised HRTFs (head related transfer function) is given. HRTFs of three subjects were recorded at seven different positions in the median plane (0 ̊, 30 ̊, 60 ̊, 90 ̊, 120 ̊, 150 ̊ and 180 ̊). Two experiments were carried out using the measurements. The first consisted of reducing the magnitude of, and removing pinna related notches in the HRTFs to identify the perceptual effects of notch manipulation in both virtual reverberant and pseudo-anechoic conditions. Results for the first experiment show a great deal of variation between subjects, although it can be said that pinna notch filling is most detrimental to median plane localisation in the BRIR condition and often results in hemispheric reversals and localisation inconsistencies. The second experiment compared localisation abilities of binaurally presented sound sources in reverberant conditions to that of binaural pseudo-anechoic conditions, using real room loudspeaker localisation as a reference. Results from the latter experiment show that virtual localisation in the median plane is better in the presence of reverberation, and that subjective experience in a listening room may influence this result.
Researcher: Dr Steve Fenton
Supervisors: Dr Jonathan Wakefield, Dr Hyunkook Lee
This thesis discusses research conducted towards the development of an objective model that predicts punch in musical signals. Punch is a term often used by engineers and producers when describing a particular perceptual sensation found in produced music. Music is often characterised by listeners as being punchier yet the term is subjective, in terms of its meaning and the subsequent auditory effect on the listener. An objective model of punch would therefore prove useful for both music classification purposes and as a possible further metric that could be employed in music production and mastering metering tools. The literature reviewed within this body of work encompasses both subjective and objective audio evaluation methods in addition to low-level signal extraction and measurement techniques. The review concludes that whilst there has been a great deal of work in the area of semantic description and audio quality measurement, low-level analysis with respect to the perception of punch remains largely unexplored. The project was completed in a number of phases each designed to investigate the perceptual effects resulting from manipulation of test stimuli. The rationale behind this testing was to establish the key low-level descriptors relating to the punch attribute with the aim of producing a final objective and perceptually based model. The listening tests in each phase were conducted according to the ITU-R BS 1534-1 recommendation. In producing an objective model for the prediction of punch, listener perception to the attribute shows a strong correlation to the signal onset times, octave frequency band, signal duration and dynamic range. The punch measure obtained using the model is named PM95, where 95 indicates the upper percentile used in the measurement. Secondary measures were also obtained as a result of the iterative approach adopted. These are Inter-Band-Ratio (IBR), Transient to Steady-state Ratio (TSR) and Transient to Steady-state Ratio+Residual (TSR+R). These measures are useful in quantifying overall audio quality with respect to its dynamic range across frequency bands in addition to being a more reliable metric for defining the overall compression being applied to a piece of music. In addition, the latter two measures proposed may be useful in highlighting perceptual masking artefacts. The completed perceptual punch model was validated using the scores obtained from a large scale and independently conducted forced pairwise comparison test using expert listeners and a wide range of musical stimuli. From the results obtained, the PM95 measure showed a ‘very strong’ positive correlation with listener punch perception. Both r and rho coefficients (0.849 and 0.833) being significant at the 0.01 level (2-tailed). The PM95M measure, which is the PM95 measure divided by the mean value of punch frames also correlated very well with the perceptual punch scale having both r and rho coefficients (0.707 and -0.750) being significant at the 0.05 level (2-tailed). A real-time implementation of the punch model (and other measures proposed in this thesis) could be utilised as extensions to the metrics currently being used in Music Information Retrieval.
Researcher: Dr Christopher Gribben
Supervisor: Dr Hyunkook Lee
Project summary: The use of three-dimensional (3D) surround sound systems has seen a rapid increase over recent years. In two-dimensional (2D) loudspeaker formats (i.e. two-channel stereophony (stereo) and 5.1 Surround), horizontal interchannel decorrelation is a well-established technique for controlling the horizontal spread of a phantom image. Use of interchannel decorrelation can also be found within established two-to-five channel upmixing methods (stereo to 5.1). More recently, proprietary algorithms have been developed that perform 2D-to-3D upmixing, which presumably make use of interchannel decorrelation as well; however, it is not currently known how interchannel decorrelation is perceived in the vertical domain. From this, it is considered that formal investigations into the perception of vertical interchannel decorrelation are necessary. Findings from such experiments may contribute to the improved control of a sound source within 3D surround systems (i.e. the vertical spread), in addition to aiding the optimisation of 2D-to-3D upmixing algorithms.
The current thesis presents a series of experiments that systematically assess vertical interchannel decorrelation under various conditions. Firstly, a comparison is made between horizontal and vertical interchannel decorrelation, where it is found that vertical decorrelation is weaker than horizontal decorrelation. However, it is also seen that vertical decorrelation can generate a significant increase of vertical image spread (VIS) for some conditions. Following this, vertical decorrelation is assessed for octave-band pink noise stimuli at various azimuth angles to the listener. The results demonstrate that vertical decorrelation is dependent on both frequency and presentation angle – a general relationship between the interchannel cross-correlation (ICC) and VIS is observed for the 500 Hz octave-band and above, and strongest for the 8 kHz octave-band. Objective analysis of these stimuli signals determined that spectral changes at higher frequencies appear to be associated with VIS perception – at 0° azimuth, the 8 and 16 kHz octave-bands demonstrate potential spectral cues, at ±30°, similar cues are seen in the 4, 8 and 16 kHz bands, and from ±110°, cues are featured in the 2, 4, 8 and 16 kHz bands. In the case of the 8 kHz octave-band, it seems that vertical decorrelation causes a ‘filling in’ of vertical localisation notch cues, potentially resulting in ambiguous perception of vertical extent. In contrast, the objective analysis suggests that VIS perception of the 500 Hz and 1 kHz bands may have been related to early reflections in the listening room.
From the experiments above, it is demonstrated that the perception of VIS from vertical inter- channel decorrelation is frequency-dependent, with high frequencies playing a particularly important role. A following experiment explores the vertical decorrelation of high frequencies only, where it is seen that decorrelation of the 500 Hz octave-band and above produces a similar perception of VIS to broadband decorrelation, whilst improving tonal quality. The results also indicate that decorrelation of the 8 kHz octave-band and above alone can significantly increase VIS, provided the source signal has sufficient high frequency energy. The final experimental chapter of the present thesis aims to provide a controlled assessment of 2D-to-3D upmixing, taking into account the findings of the previous experiments. In general, 2D-to-3D upmixing by vertical interchannel decorrelation had little impact on listener envelopment (LEV), when compared against a level-matched 2D 5.1 reference. Furthermore, amplitude-based decorrelation appeared to be marginally more effective, and ‘high-pass decorrelation’ resulted in slightly better tonal quality for sources that featured greater low frequency energy.
Researcher: Dr Rory Wallis
Supervisor: Dr Hyunkook Lee
Project summary: In the context of microphone techniques for recording three-dimensional (3D) sound in an acoustic space, vertical interchannel crosstalk occurs when the height layer of microphones capture excessive direct sound. This effect can cause sound images to be formed as vertically oriented phantom images, at positions intermediate between the main and height layer of loudspeakers, as opposed to at the desired position of the main layer. Additional spatial and timbral effects will also be perceived, although these have not been examined in the literature.
Previous research has examined the minimum amount of attenuation of direct sound in the height layer necessary to prevent vertical interchannel crosstalk from affecting the perceived location of the main channel signal, which has become known as the ‘localisation threshold’. However, existing methods of applying this have not considered the frequency dependency of median plane localisation. The present thesis therefore examined if localisation thresholds could be applied through the frequency dependent manipulation of the direct sound in the height layer (band reduction), as well as the most salient perceptual effects of vertical interchannel crosstalk. The operation of the precedence effect in the median plane was also considered.
A review of human localisation mechanisms was first conducted, with a particular focus on how such characteristics might be able to be exploited for the development of a band reduction method. Additionally, consideration was also given to how secondary vertical sources might affect direct sounds, in order to gain further understanding of what the most salient effects of vertical interchannel crosstalk might be.
The frequency dependency of localisation thresholds was considered in anechoic conditions, with subsequent localisation experiments being conducted to assist in explaining the results. Following this, localisation thresholds using blanket reduction (attenuation of the direct sound in the height layer evenly across the spectrum) were analysed. The frequency dependency of localisation threshold was subsequently examined in a natural listening environment, with a series of band reduction methods being developed based on the results. The band and blanket reduction thresholds were then verified in localisation tests. The final experiment considered the most salient effects of vertical interchannel crosstalk, how these were affected when the different localisation threshold methods were applied and which was the most preferred method by subjects.
The results showed that localisation thresholds are frequency dependent in both anechoic and natural listening environments. In particular, more level reduction was necessary for the mid-high frequencies compared to low frequencies. Additionally, a series of different band reduction methods were found to be effective. Elicitation experiments showed that the most salient effects of vertical interchannel crosstalk were increases in vertical image spread, source elevation, loudness and fullness, with the perception of these when the localisation threshold was applied being dependent on the method being used. Moreover, although subjective preference could not discriminate between the methods tested, the presence of direct sound in the height layer was consistently preferred compared to situations where it was absent. Furthermore, no evidence was found to support the existence of either the precedence effect or localisation dominance in the median plane.
Researcher: Dr Austin Moore
Supervisors: Prof Rupert Till, Dr Jonathan Wakefield
Dynamic range compression (DRC) is a common process in music production. Traditionally used to control the dynamic range of signals and reduce the risk of overloading recording devices, over time it has developed into a creative colouration effect rather than a preventative measure. This thesis investigates sonic signatures, distortion, non-linearity and how audio
material is coloured during the music production process. It explores how methodologies used to measure distortion and timbre can be used to define the sonic signature of hardware compressors and other pieces of music production equipment. A grounded theory and content analysis study was carried out to explore how producers use DRC in their work, how they describe its sound quality, which compressors they frequently use and which audio sources they process with particular types of compressor. The results from this qualitative study reveal that producers use compressors to manipulate the timbre of program material and select specific compressors with particular settings for colouration effects. Tests were carried out on a number of popular vintage hardware compressors to assess their sonic signature. Firstly, a comparative study was conducted on the Teletronix LA2A, Fairchild 670, Urei 1176 and dbx165A. Secondly a comprehensive in-depth analysis was undertaken of the 1176 to fully catalogue its sonic signature over a range of settings and to compare results from a vintage Urei Blackface 1176 and a modern Universal Audio reissue. Objective analysis was conducted on the compressors using Total Harmonic Distortion (THD), Intermodulation Distortion (IMD) and tone burst measurements. Complex program material was analysed using spectrum analysis, critical listening and audio feature extraction. It was found the compressors all have subtle nuances to their sonic signature as a result of elements in their design colouring the audio with non-linear artefacts. The 1176 was shown to impart significant amounts of distortion when used in its all-buttons mode and with fast attack and release configurations. This style of processing was favoured by producers in the qualitative study.
Researcher: Tom Robotham
Supervisors: Dr Matthew Stephenson, Dr Hyunkook Lee
Project summary: Early reflections play a large role in our perception of sound and as such, have been subject to various treatments over the years due to changing tastes and room requirements. Whilst there is research into these early reflections, arriving both vertically and horizontally in small rooms regarding critical listening, little research has been conducted regarding the beneficial or detrimental impact of early vertical reflections on listener preference, in the context of listening for entertainment. Two experiments were conducted through subjective testing in a semi-anechoic chamber and listening room in order to assess subjects’ preference of playback of a direct sound against playback with the addition of the first geometrical vertical reflection. Program material remained constant in both experiments, employing five musical and one speech stimuli. The first experiment used a paired comparison method assessing a subjects’ preference, and perceived magnitude of timbral and spatial difference provided by a frequency independent ceiling reflection. Each comparison was followed by a free verbalisation task for subjects to describe the perceived change(s). The second experiment investigated this further by focusing specifically on subjects’ preference with a frequency dependent reflection. A more controlled verbalisation task provided a list of descriptive terms which the subject’s used to describe which attribute(s) influenced their preference. The results show that preference for playback with the inclusion of a vertical reflection was highly varied across both subjects and samples. However both experiments suggest that the main perceptual attribute with which subject’s based their preference was timbre, common spatial attributes (image shift/spread) cannot be used to predict preference. Experiment two suggests that the alteration of the frequency content of a vertical reflection, may also provide a more consistent level of preference for certain stimuli. It is also shown that while certain attributes occur frequently (brilliance/fullness) for describing preference, others less frequently used (nasal/boxy), may influence preference to a greater extent.
Researcher: Dr Mark Mynett
Supervisor: Prof Rupert Till
Distinct challenges are posed when conveying Contemporary Metal Music’s(CMM) sounds and performance perspectives within a recorded and mixed form. CMM often features down tuned, heavily distorted timbres, alongside high tempi, fast and frequently complex subdivisions, and highly synchronised instrumentation. The combination of these elements results in a significant concentration of dense musical sound usually referred to as ‘heaviness’. The publications for this thesis present approaches, processes and techniques for capturing, presenting and accentuating heaviness, as well as intelligibility and performance precision which facilitate the listener’s clear comprehension of the frequent overarching complexity in the music’s construction. Intelligibility and performance precision are the principal requirements for a high commercial standard of CMM, and additionally can enhance a production’s sense of heaviness
This synoptic commentary defines heaviness from an ecological perspective, by highlighting invariant properties that shape the embodied experience of being human. Heaviness is primarily substantiated through displays of distortion and, regardless of the listening levels involved, the fundamentals of this identity are ecologically linked to volume, power, energy, intensity, emotionality and aggression. In addition to distortion, a vital component of heaviness is sonic weight, which refers to CMM’s low frequencies being associated with large, intense and powerful entities.
CMM’s heaviness is also considered in terms of the perceived proximity of activity, apparent size of performance environment, and level and type of energy being expended. In particular, CMM provides the listener with the sense of utmost proximity to the band, usually without any significant perspective of depth.
Production strategies for achieving a high commercial standard in CMM are then presented. This is followed by a reflective commentary on the portfolio of productions, which includes discussion of the author’s transition from emulative to professional level of production and considers originality within this body of work.
By presenting the subject as an important, valid and authentic scholarly discipline, this work bridges the gap between the worlds of academia and music production practice for this style.
Researcher: Lee Davis
Supervisor: Dr Hyunkook Lee
Project summary: This paper looks into comparisons of time differences recorded for echo thresholds under differing stimuli, angles and listener instructions. Previous research has focused on echo thresholds primarily with regards to level di↵erence or a limited combination of these variables. Contrasting listener instructions between research such as “echo barely audible” and “echo clearly audible” has been shown to produce different thresholds. The former instruction resulted in summing localisation being considered and lower thresholds.
Listeners manipulated sliders on a GUI to reduce the time difference between two randomly selected loudspeakers. Two tests were undertaken to grade the beginning of separation with fusion still evident and complete separation. Orchestral, pink noise burst and speech stimuli were used as continuous, transient and familiar sources espectively. 17 loudspeakers angles were available in total, however a single angle per side of the median plane was chosen randomly by the GUI which produced 10 angles per test. The listener sat in the centre of the room with speakers radiating around them at 0, ±30, ±60, ±90, ±120, ±150 and 180 degree azimuthal intervals at 0! elevation and 0, ±30 and ±110 degrees at 30 degree elevation, replicating common multichannel surround setups. The lead sound was presented from the speaker directly in front of the listener at 0! azimuth and 0! Elevation.
A Paired-Samples Sign Test was used for significance testing of median di↵erences between graded echo thresholds. There were clear median di↵erences between tests when the marking criteria was different. The orchestral stimulus was overall significantly different to the pink noise and speech stimuli in the fusion test. There were significant differences for half of the angles (those within the median plane or relatively behind the listener position) for the orchestral and pink noise comparison in the separation test. Significant differences were apparent for the majority of angles in the separation test between the orchestral and speech stimuli. For both tests, the pink noise and speech comparison showed no significant differences. Limited significant differences were noted between angles. Median plane angles for the lag sound showed increased echo thresholds.
Researcher: Mark Wendl
Supervisor: Dr Hyunkook Lee
Project summary: Even though modern technology has the capability to make audio sound better than it ever has, there is still an ongoing argument about the compression levels with music. Heavy compression has been found in previous studies to be detrimental to quality but before a full understanding into how compression affects audio, a fundamental knowledge of the affects across different frequencies is important. Using pink noise in a series of tests to determine the level of compression and how it changes the perceived loudness across different frequency bands and across different amplitude characteristics gave an insight into how compression affects complex audio. It was found that lower octaves behaved differently to the expected outcome where it was found to be perceived as louder despite no extra treatment from other frequency bands. It was however also found that all frequency bands increased in loudness as a result of compression.