|08:30 - 09:20||REGISTRATION / COFFEE|
|09:20 - 09:30||WELCOME|
|09:30 - 10:40||INVITED TALKS|
|Alessandro Palladini, Music Tribe||Intelligent Audio Machines|
|Ben Supper, ROLI||Ever more connections: AI in service of the learning musician|
|10:40 - 11:00||COFFEE BREAK|
|11:00 - 12:10||INVITED TALKS|
|Lauren Ward, University of Salford||Integrating Expert Knowledge into Intelligent and Interactive Systems|
|Thomas Lund, Genelec||On Human Perceptual Bandwidth and Slow Listening|
|12:10 - 13:10||LUNCH|
|12:50 - 15:00||POSTERS AND DEMOS AT OA7/29|
|Brecht De Man, Nick Jillings and Ryan Stables, Birmingham City University||Comparing stage metaphor interfaces as a controller for stereo position and level|
|Will Gale and Jonathan Wakefield, University of Huddersfield||Investigating the use of Virtual Reality to Solve the Underlying Problems with the 3D Stage Paradigm|
|David Moffat, Florian Thalmann, Mark B. Sandler, Queen Mary University of London||Towards a Semantic Web Representation and Application of Audio Mixing Rules|
|Hugh O’Dwyer, Enda Bates and Francis M. Boland, Trinity College Dublin||A Machine Learning Approach to Sound Source Elevation Detection in Adverse Environments|
|Dale Johnson and Hyunkook Lee, University of Huddersfield||Perceptually Optimised Virtual Acoustics|
|Sean McGrath, Manchester Metropolitan University||User Experience Design for Interactive Music Production Tools|
|Dominic Ward, Russell D. Mason, Ryan Chungeun Kim, Fabian-Robert Stöter, Antoine Liutkus and Mark D. Plumbley, University of Surrey, Inria and LIRMM, University of Montpellier||SiSEC 2018: State of the Art in Musical Audio Source Separation - Subjective Selection of the Best Algorithm|
|Andrew Parker and Steve Fenton, University of Huddersfield||Real-Time System for the Measurement of Perceived Punch|
|Nikita Goddard and Hyunkook Lee, University of Huddersfield||MARRS for the Web: A Microphone Array Recording and Reproduction Simulator developed using the Web Audio API|
|Ana Monte, DELTA Soundworks||The Stanford Virtual Heart|
|Justin Paterson, University of West London||VariPlay: The Interactive Album App|
|Jonathan Wakefield, Christopher Dewey and Matthew Tindall, University of Huddersfield|
|Jonathan Wakefield, Christopher Dewey and Will Gale, University of Huddersfield||LAMI: Leap Motion Based Audio Mixing Interface|
13:50 - 14:20
14:30 - 15:00
DEMO SESSIONS AT SEPARATE LOCATIONS
(You will receive an email for sign up after registration)
Richard J. Hughes, James Woodcock, Jon Francombe, Kristian Hentschel
|The Vostok-K Incident – an immersive audio drama for ad hoc arrays of media devices|
|Holomorph (Interactive 3D audio demo)|
|Bubbles: an object-oriented approach to object-based sound for spatial composition and beyond (Multichannel 3D audio demo)|
|Reference monitoring for stereo and immersive|
|15:00 - 15:20||COFFEE BREAK|
|15:20 - 16:30||INVITED TALKS|
|Duncan Williams, University of York||Biophysiological signals as audio meters and control signals|
|Amy Beeston, University of Sheffield||Unmaking acoustics: Bio-inspired sound information retrieval for an audio-driven artwork|
|16:30 - 17:30||PANEL DISCUSSION||
Topic: User-Centric Design of Intelligent Music Technology
- Florian Camerer (ORF)
Alessandro Palladini, Music Tribe
Artificial intelligence and intelligent system are no longer a utopian vision of the future: they are our new reality. In a world which seem every day more dominated by algorithmic intelligence, we often wonder about the role humans should play in AI-driven world. So, what is the role of sound engineers in a world dominated by intelligent audio machines? In this presentation, we will explore MUSIC Tribe vision of the future, discussing the company design principles and perspective on the making of intelligent tools for live music production.
Ben Supper, ROLI
The factors that are powering the recent explosion of interest in AI are mostly economic. First, computing power has become cheap enough, and parallel processing sufficiently trivial, to apply neural networks to new classes of problems. Larger numbers of input neurons and more hidden layers allow larger and less scrutable problems to be tackled. Second, it is now possible to train this technology using vast data sets: tens of thousands of songs and loops; millions of photographs; billions of pages of text; all free to use and readily labelled.
That nobody knows what will and will not succeed is an advantage to small teams. For a startup business like ROLI, there is a constant pressure to grow. We started as a maker of new musical instruments with a definite but limited appeal: a large spend multiplied by a small population. We now need to find a way of multiplying a smaller spend by a bigger population. The application of AI to audio in ways that interest and benefit us involve helping to turn a large, relatively untrained and undeveloped audience into musicians.
ROLI's big experiment in recent years has been to explore the idea that consumers of music would take the chance to create it, if only the tools were available to support them to learn and play, to find material to build on, to be inspired, and to join a community of people doing similar things.
As ROLI's former Head of Research (and now working independently), Ben will talk about the audacious goal of reaching a mass market in music technology, the psychology that we need to understand, and the ways in which AI can help in a mission to bring the joy of music making to the masses.
Lauren Ward, University of Salford
Object-based audio provides a platform for flexible and highly personalised broadcast content by maintaining the separation of different audio elements throughout the broadcast chain. Rendering of objects is carried out at the set-top box based on embedded metadata and user input. Personalisation for accessibility, specifically for those with hearing loss, has great potential for positive impact on the quality of user experience. This talk describes ongoing research to develop and evaluate end-user and production tools for personalisation of object-based broadcast audio. The end-user interface developed facilitates powerful personalisation using a single control by integrating expert production knowledge and producers’ creative intent for individual pieces of content. The production tools accomplish this by allowing input of metadata describing the importance of each audio object to narrative comprehension. Through the integration of this expert knowledge, a balance between improved accessibility of broadcast content and maintenance of the producer’s creative integrity is achieved.
Thomas Lund, Genelec
Locked away inside its shell, the brain has ever only learned about the world through our five primary senses. With them, we receive a fraction of the information actually available, while we perceive far less still. A fraction of a fraction: The perceptual bandwidth. Conscious perception is furthermore influenced by long-term experience and learning, to an extent that perception might be more accurately understood and studied as primarily a reach-out phenomena. Based on a review of recent physiological and psychological articles, factors in listener fatigue and cyber sickness are discussed; and it is reasoned to apply a “slow listening” methodology when conducting certain types of subjective tests.
Duncan Williams, University of York
Huge leaps have been made in portability and power of biophysiological metering, for example in large scale take up of wearable devices for fitness tracking. This talk explores our physical reactions to sound (both positive and negative) and considers how biophysiological devices might be productively harnessed as innovative sound meters and sources for novel control signals in audio production tasks. Real world examples including biosynchronous audio generation, assisted and accessibly audio mixing technology, noise evaluation, and virtual reality based gaming will be presented to illustrate some of the creative possibilities this technology now affords researchers working with sound and music computing.
Amy Beeston, University of Sheffield
This talk describes the development and realisation of a sound-driven participatory artwork 'Unmaking acoustics' which resulted from a series of initiatives aimed at encouraging women to engage in music technology. Formed during a new researcher-practitioner collaboration, the work is a pragmatic blend of our academic interests and artistic practices, and 'listens' to the loudness, pitch and noise content of sounds contributed by visitors in the gallery space. Typically appreciated by ear as a single multivariate aural experience, these sonic aspects are revealed separately to the eye using three colour-coded visual outputs, becoming quasi-independent factors that visitors can use to manipulate aspects of the soundscape produced through loudspeakers. A final display juxtaposes scrolling sonograms of the loudspeaker-generated and microphone-recorded sound, further encouraging sonic exploration and understanding. Using bio-inspired room adaptation mechanisms within our core sound information retrieval methods, we demonstrate that insights from auditory science can help improve the reliability and portability of sound art installations using live audio analysis.
Brecht De Man, Nick Jillings and Ryan Stables, Birmingham City University
Of all music production interfaces, the channel strip with a gain fader and pan pots is likely the most persistent, being found in nearly all digital audio workstations and hardware as the main way to adjust level and stereo position. Whilst other audio processors have been visualised in many alternative ways since digital audio has become pervasive, faders and pan pots are still the norm. One popular alternative to the channel strip is the stage view, or stage metaphor, in which the level and stereo position (and possibly other parameters) are modified using the position of a moveable icon on a 2D or 3D image of a stage. Previous studies have demonstrated this interface has several benefits over the traditional view, as it more accurately allows users to visualise the stereo-image of a mix.When designing a stage view, there are several configurations to choose from, and it is not yet established which model is most appropriate for effective music production. For instance, the shape of the virtual stage, the type of objects that represent sources, the areas in which the sources can be placed, and the relationship between the source positions all have an impact on the effectiveness of the user interface. In this study, we present an experiment to identify the effectiveness of each stage configuration, when used for a music production task. To do this, we present the stage to users with two separate configurations: (a) a Cartesian space where the x- and y-coordinates linearly determine the panning and gain respectively, and (b) a semi-circular, polar space where distance from a mic-position determines gain, and the angle determines the pan position. In this experiment, configuration (a) has the benefit of being very straightforward to implement and use, however is less perceptually relevant, whereas (b) has a more logical relationship with perception, however has a non-linear mapping between the stage view and the traditional channel strip. Contrasting these two paradigms, a formal user study was conducted in the form of an online mix exercise using both interfaces, followed by a questionnaire. In addition to analysis of survey responses, the movements over time were also logged to compare the approaches quantitatively.
Read the full paper: Comparing stage metaphor interfaces [PDF 391 KB]
Will Gale and Jonathan Wakefield, University of Huddersfield
3D Stage Paradigm (SP) interfaces have been shown to outperform traditional DAWs in speed, mix overview and satisfaction. However, SP interfaces raise problems of their own including clutter, object occlusion, depth perception, object interaction, exit error and gorilla arm. Building on from previous research this project implemented a 3D SP interface in Virtual Reality (VR) to try and solve these problems. A formal usability evaluation focused on efficiency, effectiveness and satisfaction was conducted. Three VR and desktop interfaces were created for two micro task and one macro task tests. Results showed VR was as efficient as desktop but slightly less effective. Furthermore, there was a significant preference towards VR. Results indicated clutter, object occlusion, and exit error are not solved. However, gorilla arm, and depth perception appear to improve in VR.
Read the full paper: The use of VR to solve problems with the 3D stage paradigm [PDF 228 KB]
David Moffat, Florian Thalmann, Mark B. Sandler, Centre for Digital Music, Queen Mary University of London
Existing literature has discussed the use of rule based systems for intelligent mixing. These rules can either be explicitly defined by experts, learned from existing datasets, or a mixture of both. For such mixing rules to be transferable between different systems and shared online, we propose a representation using the Rule Interchange Format (RIF) commonly used on the Semantic Web. Systems with differing capabilities can use OWL reasoning on those mixing rule sets to determine subsets which they can handle appropriately. We demonstrate this by means of an example web-based tool which uses a logical constraint solver to apply the rules in real time to sets of audio tracks annotated with features.
Read the full paper: Towards a Semantic Web Representation and Application of Audio Mixing Rules [PDF 144 KB]
Hugh O’Dwyer, Enda Bates and Francis M. Boland, Trinity College Dublin
Recent studies have shown that Deep Neural Networks (DNNs) are capable of detecting sound source azimuth direction in adverse environments to a high level of accuracy. This paper expands on these findings by presenting research which explores the use of DNNs in determining sound source elevation. A simple machine-learning system is presented which is capable of predicting source elevation to a relatively high degree of accuracy in both anechoic and reverberant environments. Speech signals spatialized across the front hemifield of the head are used to train a feedforward neural network. The effectiveness of Gammatone Filter Energies (GFEs) and the Cross- Correlation Function (CCF) in estimating elevation is investigated. Binaural cues such as Interaural Time Difference (ITD) and Interaural Level Difference (ILD) are also examined. Using a combination of these cues, it was found that source elevation to within 10° could be estimated to an accuracy of up to 80% in both anechoic and reverberant environments.
Read the full paper: Machine learning for sound source elevation detection [PDF 309 KB]
Dale Johnson and Hyunkook Lee, Applied Psychoacoustics Lab, University of Huddersfield
This paper presents the development of a method of perceptually optimising the acoustics and reverb of a virtual space. A spatial filtering technique was developed to group artificially rendered reflections by what spatial attribute they contribute to e.g. apparent source width, distance, loudness, colouration etc. The current system alters the level of different reflection groups depending on the desired type of optimisation. It is hoped that in the future this system could be coupled with machine learning techniques, such that it is able to determine the initial perceptual qualities of the artificial reverb, then optimise the acoustics depending on the user’s needs. Such a system could ultimately be used to universally identify what spatial qualities are good and bad, then generically optimise the acoustics automatically.
Read the full paper: Perceptually optimised virtual room acoustics [PDF 369 KB]
Sean McGrath, Manchester Metropolitan University
Our work explores the implications for the design, development and deployment of interactive music production tools through the lens of user experience design. We offer a toolkit for those interested in building human-centred software within the audio production and performance space. The work is enabled through identification of key concerns and challenges for designing software that is both usable and useful - through the exploration of ‘in the wild’ engagements. We explore the rich context of music making in-situ, highlighting the roles, features and complexities of making music in a modular, disparate and often non-linear way. The work identifies three key roles within the space and discuss the interplay between said roles. The work relates these roles to key agendas within the music production process, discussing how agendas and tools to support said agendas must change over time, supporting not only stereotypical production practice but fringe cases on the periphery of what we consider to be ‘traditional practice.’ The culimation of the work proposes an updated set of heuristics, loosely based on those proposed by Jakob Nielsen and Donald Norman. The proposed design implications relate specifically to music making activities and offer a framework to produce more usable, accessible and aesthetically pleasant digital technologies in supporting production and performance.
Read the full paper: Designing and developing user-centred systems [PDF 369 KB]
Dominic Ward1, Russell D. Mason1, Ryan Chungeun Kim1, Fabian-Robert Stöter1, Antoine Liutkus2 and Mark D. Plumbley1, 1University of Surrey, 2Inria and LIRMM, University of Montpellier, France
The Signal Separation Evaluation Campaign (SiSEC) is a large-scale regular event aimed at evaluating current progress
in source separation through a systematic and reproducible comparison of the participants’ algorithms, providing the
source separation community with an invaluable glimpse of recent achievements and open challenges. This paper focuses on the music separation task from SiSEC 2018, which compares algorithms aimed at recovering instrument stems from a stereo mix. In this context, we conducted a subjective evaluation whereby 34 listeners picked which of six competing algorithms, with high objective performance scores, best separated the singing-voice stem from 13 professionally mixed songs. The subjective results reveal strong differences between the algorithms, and highlight the presence of song-dependent performance for state-of-the-art systems. Correlations between the subjective results and the scores of two popular performance metrics are also presented.
Read the full paper: SiSEC 2018 - State of the art in musical audio source separation [PDF 304 KB]
Andrew Parker and Steve Fenton, Applied Psychoacoustics Lab, University of Huddersfield
In this paper, a real-time implementation of a punch metering plugin is described. ‘Punch’ is a perceptual attribute and can be defined by both temporal and frequency characteristics of an audio signal. The metering tool consists of signal separation, onset detection, and perceptual weighting stages. Scores are displayed on both a time graph and a histogram; statistical metrics are derived from the histogram. The output is compared to subjective punch scores obtained from a controlled listening test. Additionally, critical evaluation of the tool is performed by experienced mixing and mastering engineers. The meter is intended to allow for optimisation and objective control of punch during mixing, mastering, and broadcast.
Read the full paper: Real-Time System for the Measurement of Perceived Punch [PDF 375 KB]
Nikita Goddard and Hyunkook Lee, Applied Psychoacoustics Lab, University of Huddersfield
MARRS is an interactive tool that aids recording engineers by establishing the optimal microphone configuration for a desired auditory scene. This makes use of novel psychoacoustic algorithms based on binaural and inter-channel time-level trade-off relationships for both 2 channel and 3 channel microphone array and speaker setups. Previously a mobile app now available on the Android and Apple Store, MARRS is recreated on the web for easier accessibility and further functionality, including the addition of 3 channel microphone array and speaker setups, and the use of the Web Audio API demonstrating phantom image positions of a microphone array across 2 or 3 virtual loudspeakers via binaural rendering.
Read the full paper: MARRS for the Web: A Microphone Array Recording and Reproduction Simulator [PDF 469 KB]
Justin Paterson, University of West London, and Rob Toulson, University of Westminster
The distribution of commercial music whilst embedded in a mobile app is commonplace. There are also numerous apps that offer interactivity or algorithmic playback. Interactivity most often follows the mixing-console paradigm of offering faders with which to manipulate stems. Algorithmic playback commonly utilises device-orientation, geolocation or bio-sensing for parametric control, or perhaps real-time sequencing of internal sound sets. In 2014, the UK Arts and Humanities Research Council (AHRC) funded work to develop an interactive-playback app that was based upon audio stems, but offered more abstract GUIs to users that exerted macro control over the audio engine whilst maintaining the integrity put into the original performances and production. Further, the app could exercise autonomous control over the listener experience, relating its playback to the structure of a given song. The artist had full control over the creative content, and could elect to generate new material to represent the song in a different genre, or deploy alternative takes to offer fans deeper insight into their vision of the song – all of which could change in real time. In 2017-8, the AHRC awarded further funding in order to commercialize the system, principally in collaboration with Warner Music Group. This work has led to the development of a number of sister apps for different artists and subsequent commercial release, and the technology was branded ‘variPlay’. Contextualized with some prior and contemporary art, this session demonstrates variPlay and discusses some of the design concepts and their implementation for different artists.
Jonathan Wakefield, Christopher Dewey and Matthew Tindall, University of Huddersfield
Over last decade, the Two Dimensional Stage Paradigm (2DSP) has been proposed as an alternative to the commercially prevalent channel strip Audio Mixing Interface (AMI) paradigm. This alternative design is based on psychoacoustic principles with audio channels represented as graphical widgets on a metaphorical stage. Whilst the 2DSP has received favourable evaluation it does not scale well to high track counts because channels with similar pan positions and level visually overlap/occlude. This novel AMI considers a modified 2DSP for creating a ‘flat-mix’ which provides coarse control of channel level and pan position using a grid-based, rather than continuous, stage and extends the concept to EQ visualisation. Its motivation was to convert the ‘overlap’ deficiency of the 2DSP into an advantage.
Christopher Dewey, Jonathan Wakefield and Matthew Tindall, University of Huddersfield
A prototype system entitled KBDJ has been developed which explores the use of the ubiquitous MIDI keyboard to control a DJ performance system. The KBDJ system uses a two octave keyboard with each octave controlling one audio track. Each audio track has four two-bar loops which play in synchronisation switchable by its respective octave’s first four black keys. The top key of the keyboard toggles between frequency filter mode and time slicer mode. In frequency filter mode the white keys provide seven bands of latched frequency filtering. In time slicer mode the white keys plus black B flat key provide latched on/off control of eight time slices of the loop.
Jonathan Wakefield, Christopher Dewey and Will Gale, University of Huddersfield
Ana Monte, DELTA Soundworks
Pediatric cardiologists at Lucile Packard Children's Hospital Stanford are using immersive virtual reality technology to explain complex congenital heart defects, which are some of the most difficult medical conditions to teach and understand. The Stanford Virtual Heart experience helps families understand their child’s heart conditions by employing a new kind of interactive visualization that goes far beyond diagrams, plastic models and hand-drawn sketches. For medical trainees, it provides an immersive and engaging new way to learn about the two dozen most common and complex congenital heart anomalies. Through her poster, Ana Monte will give an insight on the challenges for the sound design and how it was integrated in Unity. Conference participants will have the opportunity to experience the immersive experience first hand.
Richard J. Hughes1, James Woodcock1, Jon Francombe2, Kristian Hentschel1, 1University of Salford, 2BBC R&D
Media device orchestration (MDO) – the concept of using an ad hoc array of auxiliary devices such as mobile phones, tablets, and Bluetooth speakers to augment a media experience – has been shown to improve the overall listening experience compared to some traditional audio reproduction methods. To further explore the potential of this concept, the S3A project have commissioned a new piece of audio drama content – The Vostok-K Incident – specifically tailored for reproduction over orchestrated ad hoc devices. Throughout the commissioning process, the writer and producers were asked to consider how MDO could be used to enhance the content. As such, the resulting audio drama consists of elements which can be flexibly reproduced from either a stereo bed or auxiliary devices, including some elements which can only ever be rendered on auxiliary devices. The semantic metadata used to describe the content, along with a flexible rendering framework, means that the content can be reproduced over any configuration of available loudspeakers. Along with a demonstration of The Vostok-K Incident, this session includes a short talk introducing the concept of MDO, the challenges of producing content when the end reproduction system is ambiguous, and potential over-IP delivery methods for MDO content.
Augstine Leuder, Magik Door
The will be a short performance and demo of an Interactive 3D sound installation which will be an extension of the "Holomorph" project which can be seen here: https://vimeo.com/234689992 The installation in the Spiral room will display an advanced incarnation of the Holomorph shown in the video above and will include spatialisation, stretching and morphing of live input (e.g. voice, instruments, etc ) in realtime in 3D space.
In recent years, much of the research into three-dimensional spatial sound has focused upon the distribution of audio around variable numbers of loudspeakers and particularly towards spatialisation tools that act on tracks within a conventional Digital Audio Workstation. It is my contention that the studio paradigm that this embodies places spatial aspects of composition in a subordinate role, as a post-production effect applied to sounds almost as an afterthought. Taking ideas from object-oriented programming and particle systems and applying them to object-based audio, this demo offers an alternative approach to both software design and compositional praxis, in which spatial aspects of sound can, from the outset, be placed on an equal footing with temporal and spectral parameters. In this approach, the sound space is represented as a hierarchy of spherical loci (bubbles), agnostic of both rendering methods and speaker configurations. The top-level bubble, containing all the sound events heard within a piece, typically contains a number of smaller loci, each of which may contain lower-level bubbles in its turn. Associated with the lowest bubbles in the hierarchy are tasks that generate sound events, each with a specific location relative to the parent. Each child bubble can undergo spatial transformation with respect to its parent and, when a parent is so transformed, each of its children inherits location data so that absolute position is modified accordingly. Thus, bubbles can move, expand, contract or rotate individually or as “groups” within the compositional design. Although initially developed for spatial information, bubbles can also hold properties regarding time, frequency shift or other parameters that may be inherited by their children. Thus, tasks can inherit tempo, event density, pitch shift and so on from their parent bubbles without altering events on other branches of the hierarchy. The Demo will discuss a prototype bubbles system using static sound events, built in SuperCollider and illustrated by a simple visualisation.
Thomas Lund, Genelec
Production monitoring has a mirror-influence on the content. Without proper anchoring of level and frequency response, drifting in such self-referenced systems is inevitable over time, thereby putting legacy recordings at the risk of sounding dated for no good reason, or causing irreversible distortion to be added to classic pieces of art. The talk defines requirements when judging level and spectral balance in professional monitoring and in content; including details on listener fatigue, sound exposure and in-room frequency response.