Projects

Below is a list of research projects currently ongoing or completed in the CAPE. Click on each project for more details.

Current Projects

360° virtual reality audio

Capturing and rendering audio for 360° virtual reality

Audio for 3D immersive reproduction

Capturing and rendering audio for 3D immersive reproduction

CityTones

CityTones: a Repository of Crowdsourced Annotated Soundfield Soundscapes

Phantom image elevation effect and VHAP

Phantom image elevation effect & Virtual Hemispherical Amplitude Panning (VHAP)

Objective measurement of perceptual audio

Towards a framework for the objective measurement of perceptual audio attributes

Prediction of auditory image position model

Development of a perceptual model for the trade-off between interaural time and level differences for the prediction of auditory image position

Quantifying Factors of Immersion in VR

Quantifying Factors of Immersion in Virtual Reality

Room acoustics in VR/Augmented reality

Perception of room acoustics in virtual/augmented reality in the context of 6 degrees of freedom

Sound source dependency of elevation localisation

Investigation into the sound source dependency of elevation localisation in multichannel audio systems

Soundscape evaluation in virtual reality

Investigations into the recording and reproduction methods for soundscape evaluation in virtual reality

New user interface design for music production

3D Audio Toolbox (3DAT)

HULTI-GEN

Huddersfield Universal Listening Test Interface Generator (HULTI-GEN)

HAART

Huddersfield Acoustical Analysis Research Toolbox (HAART)

Completed projects

Perceptual Optimisation of Virtual Acoustics

Perceptual Optimisation of Virtual Acoustics (POVA)

Perceptual Band Allocation (PBA)

Perceptual Band Allocation (PBA) for rendering vertical image spread

Pinna related transfer function attributes

The perceptual contribution of pinna related transfer function attributes in the median plane

Audio Dynamics

Audio Dynamics - Towards a Perceptual Model of Punch

Vertical interchannel decorrelation in 3D sound

Investigations into the perception of vertical interchannel decorrelation in 3D surround sound reproduction

Frequency thresholds and interchannel crosstalk

The analysis of frequency dependent localisation thresholds and the perceptual effects of vertical interchannel crosstalk

Non-Linear Sonic Signatures

An Investigation into Non-Linear Sonic Signatures with a Focus on Dynamic Range Compression and the 1176 Fet Compressor

Effects of a vertical reflection

The effects of a vertical reflection on the relationship between listener preference and timbral and spatial attributes

Contemporary Metal Music Production

Perception of echo thresholds

The effect of sound source and reflection angle on the perception of echo thresholds

Loudness perception

An investigation into the changes of loudness perception in relation to changes in crest factor for octave bands

Capturing and rendering of audio for 360° virtual reality

Researchers: Dr Hyunkook Lee, Connor Millns

Supervisor: Dr Hyunkook Lee

Project summary: This project investigates into the recording and reproduction methods for 360° audio for virtual reality applications. Currently, the most popular method for capturing 360° audio for VR is arguably the first order Ambisonics (FOA). FOA microphone systems are typically compact in size, thus convenient for location recording, and offers a stable localization characteristic and a flexible sound field rotation functionality. However, FOA has limitations in terms of perceived spaciousness and the size of sweet spot in loudspeaker reproduction due to the high level of interchannel correlation. On the other hand, a near-coincident microphone array, which incorporates directional microphones that are spaced and angled outwards, can provide a greater balance between spaciousness and localizability than a pure coincident array. The current project investigates into the localisation accuracy and spatial attributes of Equal Segment Microphone Array (ESMA) for music and urban soundscape VR applications in both visual and non-visual conditions. Also, different recording and reproduction techniques are perceptually evaluated in terms of their low-level spatial attributes. The optimal use scenarios for different techniques are determined depending on sound source, acoustic conditions and environmental context. Below is the summary of some of the key findings so far.

The correct spacing between microphones for a quadraphonic ESMA using cardioid microphones is 50cm. This has been proposed based on a novel interchannel level and time trade-off model called MARRS (Lee et al. 2017), and verified through subjective listening tests (Lee 2019). According to the model, the spacing is calculated to be smaller with microphones with higher directionality (e.g. 37cm for supercardioid and 25cm for hypercardioid).
ESMA produces better results than FOA in terms of environmental width, listener envelopment and overall spatial quality, whereas FOA tends to provides slightly greater environmental depth and source distance (Millns and Lee 2018). However, this difference depends on the positional arrangement of sound sources rather than the type of sound source.
ESMA-3D with a vertical extension of ESMA outperforms FOA in sport event recording and reproduction in both 9.1 and 5.1 formats in terms of presence, robustness, envelopment and overall quality of experience (Moulson and Lee 2019). This is regardless of the presence of visual scene.
Timbral and spatial degradation of ESMA-3D recordings in Ambisonic binaural rendering is in “Excellent” to “Good” categories for complex musical ensemble recordings when the “magnitude least square” decoding method (IEM binaural renderer) is used (Lee et al. 2019). The conventional cube virtual loudspeaker decoding produces quality in “Good” to “Fair” range.

Publications:

Lee, H., Matthias, F., and Zotter, F. (2019). Spatial and Timbral Fidelities of Binaural Ambisonics Decoders for Main Microphone Array Recordings, AES International Conference on Immersive and Interactive Audio, York.
Lee, H. (2019). Capturing 360° Audio using an Equal Segment Microphone Array (ESMA). AES: Journal of the Audio Engineering Society, 67(1/2), 13-26. https://doi.org/10.17743/jaes.2018.0068
Millns, C., & Lee, H. (2018). An Investigation into Spatial Attributes of 360° Microphone Techniques for Virtual Reality. In Proceedings of 144th AES International Convention [Convention Paper 10005] Audio Engineering Society.
Lee, H., Johnson, D., & Mironovs, M. (2017). An Interactive and Intelligent Tool for Microphone Array Design. In Audio Engineering Society Convention 143 Audio Engineering Society.

CityTones: a Repository of Crowdsourced Annotated Soundfield Soundscapes

Researchers: Prof Agnieszka Roginska (NYU), Dr Hyunkook Lee, Ana Elisa Mendez Mendez (NYU), Scott Murakami (NYU), Andrea Genovese (NYU)

Supervisors: Prof Agnieszka Roginska, Dr Hyunkook Lee

Project summary: The CityTones project is a collaborative open-source repository initiated and administered by New York University (NYU) Steinhardt and the University of Huddersfield. CityTones invites sound recordists around the world to contribute to the repository using 360-degree audio and visual capture methods. The database includes descriptors containing information about the technical details of the recording, physical information, subjective quality attributes, and sound content information. The recordings are verified after submission and made available in the public database. The database will be publicly available for users to download. Applications include the simulation of environments, sound design, research areas such as audio engineering, human computer interaction and machine listening. The data and recordings can be used to study immersive recording techniques. The data with crowdsourced annotations can be used in machine listening research to train models for sound source identification.

The microphone system to be used for audio recording must be compatible for 360° audio rendering in the 1st order B-format. If a Higher Order Ambisonics (HOA) system is used, the recording must be converted into the 1st order B-format for submission. A multichannel spaced microphone array designed for 360° audio capture can also be used (e.g., ESMA-3D, Schoeps ORTF-3D, etc.). Signals captured by such an array must be encoded in the 1st order B-format. The audio recordings should recorded digitally in the PCM wave format at a sampling rate of 48 kHz, with a bit depth of 24 bits. A spherical visual recording or at least a panoramic picture must accompany the audio recording in order to provide all-around visual information about the recording location. The duration of the recordings must be a minimum of 3 minutes in length, with no specified maximum time limit. A length of about 5 minutes for each recording is recommended.

The audio/video recordings for CityTones will be submitted through a portal on the NYU Immersive Audio Group website https://wp.nyu.edu/immersiveaudiogroup/citytones/. The submission process involves a Google survey and submission of the recording. Through the survey, submitters provide descriptive information including physical and technical details, and subjective quality attributes.

Publications:

Roginska, A., Lee, H., Mendez, A. E., Murakami, S. (2019). CityTones: a Repository of Crowdsourced Annotated Soundfield Soundscapes, Audio Engineering Society 146th International Convention, Dublin.

Phantom image elevation effect & Virtual Hemispherical Amplitude Panning (VHAP)

Researcher: Dr Hyunkook Lee, Dr Dale Johnson and Maksims Mironovs

Supervisor: Dr Hyunkook Lee

Project summary: Early studies reported that, when two identical signals are simultaneously reproduced from a pair of loudspeakers that are placed at ear level and arranged symmetrically from the listener position, the resulting phantom centre image would be perceived to be elevated in the median plane. It was also confirmed in the studies that the degree of perceived elevation would increase as the loudspeaker base angle increased from 0° to 180°; the image would be perceived almost right above the listener’s head when the base angle is 180°.

This project investigates into this psychoacoustic effect further, providing more systematic subjective data and theoretical explanations, and also develops a new virtual 3D panning method called VHAP (virtual hemispherical amplitude panning) based on the effect. The main findings and outcomes so far as follows:

The strength of this effect significantly depends on the type of sound source; sound sources with a flatter frequency spectrum and more transient nature would be perceived to be more elevated (Lee 2018).
A new theory has been proposed: in addition to conventional explanation regarding spectral energy balance at high frequencies, it has been proposed and verified that the phantom image elevation effect is associated with spectral notch below 1 kHz that is caused due to the combination of ipsilateral (direct) and contralateral (interaural crosstalk) signals from the loudspeakers; e.g. for the 180° base angle the spectral notch for a phantom centre image is around 640Hz, which matches the spectral notch frequency for a real source elevated at 90° in the median plane (Lee 2017).
VHAP (virtual hemispherical amplitude panning) creates an elevated phantom source on a virtual upper-hemisphere with only four ear-height loudspeakers, based on the phantom image elevation effect. A set of constant power gain coefficients are applied to loudspeakers at ±90° and 0° for panning to a target azimuth and elevation in the front region, and to those at ±90° and 180° for panning in the back region (Lee et al 2018).
Listening tests in loudspeaker reproduction show that VHAP can locate a phantom image at various spherical coordinates in the upper hemisphere with some limitations in accuracy and resolution (Lee et al 2018).
Listening tests in binaural headphone reproduction indicate that the binaural rendering of VHAP is able to externalise elevated phantom images in various degrees of perceived distance (Lee et al 2019).

Publications:

Lee, H, Mironovs, M, & Johnson. D. (2019) Binaural Rendering of Virtual Elevation using the VHAP Plugin. In Proceedings of 146th AES International Convention
Lee, H., Johnson, D., & Mironovs, M. (2018). Virtual Hemispherical Amplitude Panning (VHAP): A Method for 3D Panning without Elevated Loudspeakers. In Proceedings of 144th AES International Convention [Convention Paper 9965]
Lee, H. (2017). Sound Source and Loudspeaker Base Angle Dependency of Phantom Image Elevation Effect. AES: Journal of the Audio Engineering Society, 65(9), 733-748. https://doi.org/10.17743/jaes.2017.0028
Lee, H. (2016). Phantom image elevation explained. In 141st Audio Engineering Society International Convention 2016, AES 2016 [9664] Audio Engineering Society.

Perceptual Optimisation of Virtual Acoustics (POVA)

Researcher: Dr Dale Johnson

Supervisor: Dr Hyunkook Lee

Abstract:

In virtual reality, it is important that the user is immersed, and that both the visual and listening experiences are pleasant and plausible. Whilst it is now possible to accurately model room acoustics using available scene geometry, the perceptual attributes may not always be optimal. Previous research has examined high level control methods over attributes, yet have only been applied to algorithmic reverberators and not geometric types, which can model the acoustics of a virtual scene more accurately. The present thesis investigates methods of perceptual control over apparent source width and tonal colouration in virtual room acoustics, and is an important step towards and intelligent optimisation method for dynamically improving the listening experience.

A review of the psychoacoustic mechanisms of spatial impression and tonal colouration was performed. Consideration was given to the effects early of reflections on these two attributes so that they can be exploited. Existing artificial reverb methods, mainly algorithmic, wave-based and geometric types, were reviewed. It was found that a geometric type was the most suitable, and so a virtual acoustics program that gave access to each reflection and their meta-data was developed. The program would allow for perceptual control methods to exploit the reflection meta-data.

Experiments were performed to find novel, directional regions to sort and group reflections by how they contribute to an attribute. The first was a region of in the horizontal plane, where any reflection arriving within it will produce maximum perceived apparent source width (ASW). Another discovered two regions of and unacceptable colouration in front of and behind the listener. Any reflection arriving within these will produce unacceptable colouration. Level adjustment of reflections within either region should manipulate the corresponding attributes, forming the basis of the control methods.

An investigation was performed where the methods were applied to binaural room impulse responses generated by the custom program in two different virtual rooms at three source-receiver distances. An elicitation test was performed to find out what perceptual differences the control methods caused using speech, guitar and orchestral sources. It was found that the largest differences were in ASW, loudness, distance and phasiness. Further investigation into the effectiveness of the control methods found that level adjustment of lateral reflections was fairly effective for controlling the degree of ASW without affecting tonal colouration. They also found that level adjustment of front-back reflections can affect ASW, yet had little effect on colouration. The final experiment compared both methods, and also investigated their effect on source loudness and distance. Again it was found that level adjustment in both regions had a significant effect on ASW yet little effect on phasiness. It was also found that they significantly affected loudness and distance. Analysis found that the changes in ASW may be linked to changes in loudness and distance.

Publications:

Audio Dynamics - Towards a Perceptual Model of Punch

Researcher: Dr Steve Fenton

Supervisors: Dr Jonathan Wakefield, Dr Hyunkook Lee

Abstract:

This thesis discusses research conducted towards the development of an objective model that predicts punch in musical signals. Punch is a term often used by engineers and producers when describing a particular perceptual sensation found in produced music. Music is often characterised by listeners as being punchier yet the term is subjective, in terms of its meaning and the subsequent auditory effect on the listener. An objective model of punch would therefore prove useful for both music classification purposes and as a possible further metric that could be employed in music production and mastering metering tools. The literature reviewed within this body of work encompasses both subjective and objective audio evaluation methods in addition to low-level signal extraction and measurement techniques. The review concludes that whilst there has been a great deal of work in the area of semantic description and audio quality measurement, low-level analysis with respect to the perception of punch remains largely unexplored. The project was completed in a number of phases each designed to investigate the perceptual effects resulting from manipulation of test stimuli. The rationale behind this testing was to establish the key low-level descriptors relating to the punch attribute with the aim of producing a final objective and perceptually based model. The listening tests in each phase were conducted according to the ITU-R BS 1534-1 recommendation. In producing an objective model for the prediction of punch, listener perception to the attribute shows a strong correlation to the signal onset times, octave frequency band, signal duration and dynamic range. The punch measure obtained using the model is named PM95, where 95 indicates the upper percentile used in the measurement. Secondary measures were also obtained as a result of the iterative approach adopted. These are Inter-Band-Ratio (IBR), Transient to Steady-state Ratio (TSR) and Transient to Steady-state Ratio+Residual (TSR+R). These measures are useful in quantifying overall audio quality with respect to its dynamic range across frequency bands in addition to being a more reliable metric for defining the overall compression being applied to a piece of music. In addition, the latter two measures proposed may be useful in highlighting perceptual masking artefacts. The completed perceptual punch model was validated using the scores obtained from a large scale and independently conducted forced pairwise comparison test using expert listeners and a wide range of musical stimuli. From the results obtained, the PM95 measure showed a ‘very strong’ positive correlation with listener punch perception. Both r and rho coefficients (0.849 and 0.833) being significant at the 0.01 level (2-tailed). The PM95M measure, which is the PM95 measure divided by the mean value of punch frames also correlated very well with the perceptual punch scale having both r and rho coefficients (0.707 and -0.750) being significant at the 0.05 level (2-tailed). A real-time implementation of the punch model (and other measures proposed in this thesis) could be utilised as extensions to the metrics currently being used in Music Information Retrieval.

Publications:

Investigations into the perception of vertical interchannel decorrelation in 3D surround sound reproduction

Researcher: Dr Christopher Gribben

Supervisor: Dr Hyunkook Lee

Project summary: The use of three-dimensional (3D) surround sound systems has seen a rapid increase over recent years. In two-dimensional (2D) loudspeaker formats (i.e. two-channel stereophony (stereo) and 5.1 Surround), horizontal interchannel decorrelation is a well-established technique for controlling the horizontal spread of a phantom image. Use of interchannel decorrelation can also be found within established two-to-five channel upmixing methods (stereo to 5.1). More recently, proprietary algorithms have been developed that perform 2D-to-3D upmixing, which presumably make use of interchannel decorrelation as well; however, it is not currently known how interchannel decorrelation is perceived in the vertical domain. From this, it is considered that formal investigations into the perception of vertical interchannel decorrelation are necessary. Findings from such experiments may contribute to the improved control of a sound source within 3D surround systems (i.e. the vertical spread), in addition to aiding the optimisation of 2D-to-3D upmixing algorithms.

The current thesis presents a series of experiments that systematically assess vertical interchannel decorrelation under various conditions. Firstly, a comparison is made between horizontal and vertical interchannel decorrelation, where it is found that vertical decorrelation is weaker than horizontal decorrelation. However, it is also seen that vertical decorrelation can generate a significant increase of vertical image spread (VIS) for some conditions. Following this, vertical decorrelation is assessed for octave-band pink noise stimuli at various azimuth angles to the listener. The results demonstrate that vertical decorrelation is dependent on both frequency and presentation angle – a general relationship between the interchannel cross-correlation (ICC) and VIS is observed for the 500 Hz octave-band and above, and strongest for the 8 kHz octave-band. Objective analysis of these stimuli signals determined that spectral changes at higher frequencies appear to be associated with VIS perception – at 0° azimuth, the 8 and 16 kHz octave-bands demonstrate potential spectral cues, at ±30°, similar cues are seen in the 4, 8 and 16 kHz bands, and from ±110°, cues are featured in the 2, 4, 8 and 16 kHz bands. In the case of the 8 kHz octave-band, it seems that vertical decorrelation causes a ‘filling in’ of vertical localisation notch cues, potentially resulting in ambiguous perception of vertical extent. In contrast, the objective analysis suggests that VIS perception of the 500 Hz and 1 kHz bands may have been related to early reflections in the listening room.

From the experiments above, it is demonstrated that the perception of VIS from vertical inter- channel decorrelation is frequency-dependent, with high frequencies playing a particularly important role. A following experiment explores the vertical decorrelation of high frequencies only, where it is seen that decorrelation of the 500 Hz octave-band and above produces a similar perception of VIS to broadband decorrelation, whilst improving tonal quality. The results also indicate that decorrelation of the 8 kHz octave-band and above alone can significantly increase VIS, provided the source signal has sufficient high frequency energy. The final experimental chapter of the present thesis aims to provide a controlled assessment of 2D-to-3D upmixing, taking into account the findings of the previous experiments. In general, 2D-to-3D upmixing by vertical interchannel decorrelation had little impact on listener envelopment (LEV), when compared against a level-matched 2D 5.1 reference. Furthermore, amplitude-based decorrelation appeared to be marginally more effective, and ‘high-pass decorrelation’ resulted in slightly better tonal quality for sources that featured greater low frequency energy.

Publications:

Gribben, C., & Lee, H. (2018). Increasing the Vertical Image Spread of Natural Sound Sources using Band-Limited Interchannel Decorrelation, AES International Conference on Immersive and Interactive Audio, York.
Gribben, C., & Lee, H. (2018). The Frequency and Loudspeaker-Azimuth Dependencies of Vertical Interchannel Decorrelation on the Vertical Spread of an Auditory Image. AES: Journal of the Audio Engineering Society, 66(7-8), 537-555. https://doi.org/10.17743/jaes.2018.0040
Gribben, C., & Lee, H. (2017). A Comparison between Horizontal and Vertical Interchannel Decorrelation. Applied Sciences, 7(11), [1202]. https://doi.org/10.3390/app7111202

The analysis of frequency dependent localisation thresholds and the perceptual effects of vertical interchannel crosstalk

Researcher: Dr Rory Wallis

Supervisor: Dr Hyunkook Lee

Project summary: In the context of microphone techniques for recording three-dimensional (3D) sound in an acoustic space, vertical interchannel crosstalk occurs when the height layer of microphones capture excessive direct sound. This effect can cause sound images to be formed as vertically oriented phantom images, at positions intermediate between the main and height layer of loudspeakers, as opposed to at the desired position of the main layer. Additional spatial and timbral effects will also be perceived, although these have not been examined in the literature.

Previous research has examined the minimum amount of attenuation of direct sound in the height layer necessary to prevent vertical interchannel crosstalk from affecting the perceived location of the main channel signal, which has become known as the ‘localisation threshold’. However, existing methods of applying this have not considered the frequency dependency of median plane localisation. The present thesis therefore examined if localisation thresholds could be applied through the frequency dependent manipulation of the direct sound in the height layer (band reduction), as well as the most salient perceptual effects of vertical interchannel crosstalk. The operation of the precedence effect in the median plane was also considered.

A review of human localisation mechanisms was first conducted, with a particular focus on how such characteristics might be able to be exploited for the development of a band reduction method. Additionally, consideration was also given to how secondary vertical sources might affect direct sounds, in order to gain further understanding of what the most salient effects of vertical interchannel crosstalk might be.

The frequency dependency of localisation thresholds was considered in anechoic conditions, with subsequent localisation experiments being conducted to assist in explaining the results. Following this, localisation thresholds using blanket reduction (attenuation of the direct sound in the height layer evenly across the spectrum) were analysed. The frequency dependency of localisation threshold was subsequently examined in a natural listening environment, with a series of band reduction methods being developed based on the results. The band and blanket reduction thresholds were then verified in localisation tests. The final experiment considered the most salient effects of vertical interchannel crosstalk, how these were affected when the different localisation threshold methods were applied and which was the most preferred method by subjects.

The results showed that localisation thresholds are frequency dependent in both anechoic and natural listening environments. In particular, more level reduction was necessary for the mid-high frequencies compared to low frequencies. Additionally, a series of different band reduction methods were found to be effective. Elicitation experiments showed that the most salient effects of vertical interchannel crosstalk were increases in vertical image spread, source elevation, loudness and fullness, with the perception of these when the localisation threshold was applied being dependent on the method being used. Moreover, although subjective preference could not discriminate between the methods tested, the presence of direct sound in the height layer was consistently preferred compared to situations where it was absent. Furthermore, no evidence was found to support the existence of either the precedence effect or localisation dominance in the median plane.

Publications:

An Investigation into Non-Linear Sonic Signatures with a Focus on Dynamic Range Compression and the 1176 Fet Compressor

Researcher: Dr Austin Moore

Supervisors: Prof Rupert Till, Dr Jonathan Wakefield

Abstract:

Dynamic range compression (DRC) is a common process in music production. Traditionally used to control the dynamic range of signals and reduce the risk of overloading recording devices, over time it has developed into a creative colouration effect rather than a preventative measure. This thesis investigates sonic signatures, distortion, non-linearity and how audio

material is coloured during the music production process. It explores how methodologies used to measure distortion and timbre can be used to define the sonic signature of hardware compressors and other pieces of music production equipment. A grounded theory and content analysis study was carried out to explore how producers use DRC in their work, how they describe its sound quality, which compressors they frequently use and which audio sources they process with particular types of compressor. The results from this qualitative study reveal that producers use compressors to manipulate the timbre of program material and select specific compressors with particular settings for colouration effects. Tests were carried out on a number of popular vintage hardware compressors to assess their sonic signature. Firstly, a comparative study was conducted on the Teletronix LA2A, Fairchild 670, Urei 1176 and dbx165A. Secondly a comprehensive in-depth analysis was undertaken of the 1176 to fully catalogue its sonic signature over a range of settings and to compare results from a vintage Urei Blackface 1176 and a modern Universal Audio reissue. Objective analysis was conducted on the compressors using Total Harmonic Distortion (THD), Intermodulation Distortion (IMD) and tone burst measurements. Complex program material was analysed using spectrum analysis, critical listening and audio feature extraction. It was found the compressors all have subtle nuances to their sonic signature as a result of elements in their design colouring the audio with non-linear artefacts. The 1176 was shown to impart significant amounts of distortion when used in its all-buttons mode and with fast attack and release configurations. This style of processing was favoured by producers in the qualitative study.

Publications:

The effects of a vertical reflection on the relationship between listener preference and timbral and spatial attributes

Researcher: Tom Robotham

Supervisors: Dr Matthew Stephenson, Dr Hyunkook Lee

Project summary: Early reflections play a large role in our perception of sound and as such, have been subject to various treatments over the years due to changing tastes and room requirements. Whilst there is research into these early reflections, arriving both vertically and horizontally in small rooms regarding critical listening, little research has been conducted regarding the beneficial or detrimental impact of early vertical reflections on listener preference, in the context of listening for entertainment. Two experiments were conducted through subjective testing in a semi-anechoic chamber and listening room in order to assess subjects’ preference of playback of a direct sound against playback with the addition of the first geometrical vertical reflection. Program material remained constant in both experiments, employing five musical and one speech stimuli. The first experiment used a paired comparison method assessing a subjects’ preference, and perceived magnitude of timbral and spatial difference provided by a frequency independent ceiling reflection. Each comparison was followed by a free verbalisation task for subjects to describe the perceived change(s). The second experiment investigated this further by focusing specifically on subjects’ preference with a frequency dependent reflection. A more controlled verbalisation task provided a list of descriptive terms which the subject’s used to describe which attribute(s) influenced their preference. The results show that preference for playback with the inclusion of a vertical reflection was highly varied across both subjects and samples. However both experiments suggest that the main perceptual attribute with which subject’s based their preference was timbre, common spatial attributes (image shift/spread) cannot be used to predict preference. Experiment two suggests that the alteration of the frequency content of a vertical reflection, may also provide a more consistent level of preference for certain stimuli. It is also shown that while certain attributes occur frequently (brilliance/fullness) for describing preference, others less frequently used (nasal/boxy), may influence preference to a greater extent.

Publications:

Robotham, T., Stephenson, M., & Lee, H. (2016). The effect of a vertical reflection on the relationship between preference and perceived change in timbral and spatial attributes. In 140th Audio Engineering Society International Convention 2016, AES 2016 [9547] Audio Engineering Society.

Contemporary Metal Music Production

Researcher: Dr Mark Mynett

Supervisor: Prof Rupert Till

Abstract:

Distinct challenges are posed when conveying Contemporary Metal Music’s(CMM) sounds and performance perspectives within a recorded and mixed form. CMM often features down tuned, heavily distorted timbres, alongside high tempi, fast and frequently complex subdivisions, and highly synchronised instrumentation. The combination of these elements results in a significant concentration of dense musical sound usually referred to as ‘heaviness’. The publications for this thesis present approaches, processes and techniques for capturing, presenting and accentuating heaviness, as well as intelligibility and performance precision which facilitate the listener’s clear comprehension of the frequent overarching complexity in the music’s construction. Intelligibility and performance precision are the principal requirements for a high commercial standard of CMM, and additionally can enhance a production’s sense of heaviness

This synoptic commentary defines heaviness from an ecological perspective, by highlighting invariant properties that shape the embodied experience of being human. Heaviness is primarily substantiated through displays of distortion and, regardless of the listening levels involved, the fundamentals of this identity are ecologically linked to volume, power, energy, intensity, emotionality and aggression. In addition to distortion, a vital component of heaviness is sonic weight, which refers to CMM’s low frequencies being associated with large, intense and powerful entities.

CMM’s heaviness is also considered in terms of the perceived proximity of activity, apparent size of performance environment, and level and type of energy being expended. In particular, CMM provides the listener with the sense of utmost proximity to the band, usually without any significant perspective of depth.

Production strategies for achieving a high commercial standard in CMM are then presented. This is followed by a reflective commentary on the portfolio of productions, which includes discussion of the author’s transition from emulative to professional level of production and considers originality within this body of work.

By presenting the subject as an important, valid and authentic scholarly discipline, this work bridges the gap between the worlds of academia and music production practice for this style.

Publications:

The effect of sound source and reflection angle on the perception of echo thresholds

Researcher: Lee Davis

Supervisor: Dr Hyunkook Lee

Project summary: This paper looks into comparisons of time differences recorded for echo thresholds under differing stimuli, angles and listener instructions. Previous research has focused on echo thresholds primarily with regards to level di↵erence or a limited combination of these variables. Contrasting listener instructions between research such as “echo barely audible” and “echo clearly audible” has been shown to produce different thresholds. The former instruction resulted in summing localisation being considered and lower thresholds.

Listeners manipulated sliders on a GUI to reduce the time difference between two randomly selected loudspeakers. Two tests were undertaken to grade the beginning of separation with fusion still evident and complete separation. Orchestral, pink noise burst and speech stimuli were used as continuous, transient and familiar sources espectively. 17 loudspeakers angles were available in total, however a single angle per side of the median plane was chosen randomly by the GUI which produced 10 angles per test. The listener sat in the centre of the room with speakers radiating around them at 0, ±30, ±60, ±90, ±120, ±150 and 180 degree azimuthal intervals at 0! elevation and 0, ±30 and ±110 degrees at 30 degree elevation, replicating common multichannel surround setups. The lead sound was presented from the speaker directly in front of the listener at 0! azimuth and 0! Elevation.

A Paired-Samples Sign Test was used for significance testing of median di↵erences between graded echo thresholds. There were clear median di↵erences between tests when the marking criteria was different. The orchestral stimulus was overall significantly different to the pink noise and speech stimuli in the fusion test. There were significant differences for half of the angles (those within the median plane or relatively behind the listener position) for the orchestral and pink noise comparison in the separation test. Significant differences were apparent for the majority of angles in the separation test between the orchestral and speech stimuli. For both tests, the pink noise and speech comparison showed no significant differences. Limited significant differences were noted between angles. Median plane angles for the lag sound showed increased echo thresholds.

Publications:

Davis, L and Lee, H. (2016) Echo thresholds for a 3D loudspeaker configuration, In Audio Engineering Society 140th convention.