Wenwu Wang_CSIP 2023

Biography
Prof. Wenwu Wang University of Surrey, UK
Title: Generative AI for Text to Audio Generation
Abstract: Text-to-audio generation aims to produce an audio clip based on a text prompt which is a language description of the audio content to be generated. This can be used as sound synthesis tools for film making, game design, virtual reality/metaverse, digital media, and digital assistants for text understanding by the visually impaired. To achieve cross modal text to audio generation, it is essential to comprehend the audio events and scenes within an audio clip, as well as interpret the textual information presented in natural language. In addition, learning the mapping and alignment of these two streams of information is crucial. Exciting developments have recently emerged in the field of automated audio-text cross modal generation. In this talk, we will give an introduction of this field, including problem description, potential applications, datasets, open challenges, recent technical progresses, and possible future research directions. We will start with the conditional audio generation method which we published in MLSP 2021 and used as the baseline system in DCASE 2023. We then move on to the discussion of several algorithms that we have developed recently, including AudioLDM, AudioLDM2, Re-AudioLDM, and AudioSep, which are getting increasingly popular in the signal processing, machine learning, and audio engineering communities.
Biography: Wenwu Wang is a Professor in Signal Processing and Machine Learning, and a Co-Director of the Machine Audition Lab within the Centre for Vision Speech and Signal Processing, University of Surrey, UK. He is also an AI Fellow at the Surrey Institute for People Centred Artificial Intelligence. His current research interests include signal processing, machine learning and perception, artificial intelligence, machine audition (listening), and statistical anomaly detection. He has (co)-authored over 300 papers in these areas. He has been involved as Principal or Co-Investigator in more than 30 research projects, funded by UK and EU research councils, and industry (e.g. BBC, NPL, Samsung, Tencent, Huawei, Saab, Atlas, and Kaon) with grant portfolio of over £30M. He is a (co-)recipient of over 15 awards including the 2022 IEEE Signal Processing Society Young Author Best Paper Award, ICAUS 2021 Best Paper Award, DCASE 2020 and 2023 Judge’s Award, DCASE 2019 and 2020 Reproducible System Award, LVA/ICA 2018 Best Student Paper Award, FSDM 2016 Best Oral Presentation, and Dstl Challenge 2012 Best Solution Award. He is the elected Chair of IEEE Signal Processing Society (SPS) Machine Learning for Signal Processing Technical Committee, the Vice Chair of the EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing, a Board Member of IEEE SPS Technical Directions Board, an elected Member of the IEEE SPS Signal Processing Theory and Methods Technical Committee, and an elected Member of the International Steering Committee of Latent Variable Analysis and Signal Separation. He is an Associate Editor for IEEE/ACM Transactions on Audio Speech and Language Processing, an Associate Editor for (Nature) Scientific Report, and a Specialty Editor in Chief of Frontier in Signal Processing. He was a Senior Area Editor (2019-2023) and an Associate Editor (2014-2018) for IEEE Transactions on Signal Processing. He has been a Satellite Workshop Co-Chair for IEEE ICASSP 2024, Special Session Co-Chair for IEEE MLSP 2024, a Satellite Workshop Co-Chair for INTERSPEECH 2022, a Publication Co-Chair for IEEE ICASSP 2019, Local Arrangement Co-Chair of MLSP 2013, and Publicity Co-Chair of IEEE SSP 2009. He has been an invited keynote or plenary speaker on more than 20 international conferences and workshops, and a member of the technical program committee for more than 100 international conferences or workshops.