- 제목: Personalized AI for Speech Enhancement and Music Applications
- 연사: 김민제 교수
- 일시: 2023년 5월 26일 금요일 오후 4시 20분
- 장소: 서강대학교 리치과학관 909호 (R909)
- Abstract: This talk highlights recent advancements in the emerging field of personalized speech enhancement. By focusing on an individual user's speech characteristics or acoustic environment, personalized models offer more efficient machine learning inference and improved performance compared to general-purpose models. Additionally, personalization can enhance fairness for underrepresented users in large training datasets. However, personalized speech enhancement presents challenges, such as utilizing personal information from unknown test-time users and addressing privacy concerns related to personal data. In this talk, we will explore machine learning solutions to these issues, such as zero- or few-shot learning approaches, data augmentation and purification, self-supervised learning, and knowledge distillation. These methods can improve data and resource efficiency while achieving the desired speech enhancement performance.
The second part of this talk will introduce a few interactive music applications that aim to utilize users' feedback on the system for better usability. These systems represent audio concepts as latent representations, allowing the decoder of the system to flexibly synthesize a desired version of the music. As examples, we will see a source separation system that separates a spatially defined target instrument in the stereophonic mixture, music remixing and upmixing systems that can control the volume of individual instruments and create a 5-channel spatial image in the latent space, and a deep learning-based autotuning system that can adjust the pitch of off-tune singing voices.
- Bio: Minje Kim is an associate professor in the Dept. of Intelligent Systems Engineering at Indiana University, where he leads his research group, Signals and AI Group in Engineering (SAIGE), and is affiliated with Luddy AI Center, Data Science, Cognitive Science, Statistics, Center for Machine Learning, and Crisis Technologies Innovation Lab. He is also an Amazon Visiting Academic, working at Amazon Lab126. He earned his Ph.D. in the Dept. of Computer Science at the University of Illinois at Urbana-Champaign. Before joining UIUC, He worked as a researcher at ETRI, a national lab in Korea, from 2006 to 2011. Before then, he received his Master’s and Bachelor’s degrees in the Department of Computer Science and Engineering at POSTECH (Summa Cum Laude) and in the Division of Information and Computer Engineering at Ajou University (with honors) in 2006 and 2004, respectively. During his career as a researcher, he has focused on developing machine learning models for audio signal processing applications. He is a recipient of various awards, including NSF Career Award (2021), IU Trustees Teaching Award (2021), IEEE SPS Best Paper Award (2020), Google and Starkey’s grants for outstanding student papers in ICASSP 2013 and 2014, respectively, and Richard T. Cheng Endowed Fellowship from UIUC in 2011. He is an IEEE Senior Member and also a member of the IEEE Audio and Acoustic Signal Processing Technical Committee (2018-2023). He is serving as Senior Area Editor for IEEE/ACM Transactions on Audio, Speech, and Language Processing, Associate Editor for EURASIP Journal of Audio, Speech, and Music Processing, and Consulting Associate Editor for IEEE Open Journal of Signal Processing. He is General Chair of IEEE WASPAA 2023 and also a reviewer, program committee member, or area chair for the major machine learning and signal processing venues, such as NeurIPS, ICML, AAAI, IJCAI, ICLR, ICASSP, Interspeech, ISMIR, IEEE T-ASLP, IEEE SPL, etc. He is on more than 50 patents as an inventor.
- 주관: 박형민 교수 연구실