正在加载中,请稍后...
Technical Program
Position: Home > Program > Technical Program >

Sunday 16:30-18:00(GMT+8), October 25

Tutorial A-1-1  

Efficient and Flexible Implementation of Machine Learning for ASR and MT

Albert Zeyer (RWTH; AppTek), Nick Rossenbach (RWTH; AppTek), Parnia Bahar (RWTH; AppTek), André Merboldt (RWTH), Ralf Schlüter (RWTH; AppTek)

Tutorial A-2-1  

Spoken Dialogue for Social Robots

Tatsuya Kawahara (Kyoto University, Japan), Kristiina Jokinen (AI Research Center AIST Tokyo Waterfront, Japan)

Tutorial A-3-1  

Meta Learning and Its Applications to Human Language Processing

Hung-yi Lee (Department of Electrical Engineering, National Taiwan University), Ngoc Thang Vu (Institute for Natural Language Processing, Stuttgart), Shang-Wen Li (Amazon AWS AI)

Tutorial A-4-1  

Intelligibility Evaluation and Speech Enhancement based on Deep Learning

Yu Tsao (The Research Center for Information Technology Innovation (CITI), Academia Sinica), Fei Chen (Department of Electrical and Electronic Engineering, Southern University of Science and Technology)

Sunday 18:15-19:45(GMT+8), October 25

Tutorial A-1-2  

Efficient and Flexible Implementation of Machine Learning for ASR and MT

Albert Zeyer (RWTH; AppTek), Nick Rossenbach (RWTH; AppTek), Parnia Bahar (RWTH; AppTek), André Merboldt (RWTH), Ralf Schlüter (RWTH; AppTek)

Tutorial A-2-2  

Spoken Dialogue for Social Robots

Tatsuya Kawahara (Kyoto University, Japan), Kristiina Jokinen (AI Research Center AIST Tokyo Waterfront, Japan)

Tutorial A-3-2  

Meta Learning and Its Applications to Human Language Processing

Hung-yi Lee (Department of Electrical Engineering, National Taiwan University), Ngoc Thang Vu (Institute for Natural Language Processing, Stuttgart), Shang-Wen Li (Amazon AWS AI)

Tutorial A-4-2  

Intelligibility Evaluation and Speech Enhancement based on Deep Learning

Yu Tsao (The Research Center for Information Technology Innovation (CITI), Academia Sinica), Fei Chen (Department of Electrical and Electronic Engineering, Southern University of Science and Technology)

Sunday 20:00-21:30(GMT+8), October 25

Tutorial B-1-1  

Tutorial B-2-1  

Neural Approaches to Conversational Information Retrieval

Jianfeng Gao (Microsoft Research, Redmond), Chenyan Xiong (Microsoft Research, Redmond), Paul Bennett (Microsoft Research, Redmond)

Tutorial B-3-1  

Neural Models for Speaker Diarization in the Context of Speech Recognition

Kyu J. Han (ASAPP Inc.), Tae Jin Park (University of Southern California), Dimitrios Dimitriadis (Microsoft, WA)

Tutorial B-4-1  

Spoken Language Processing for Language Learning and Assessment

Vikram Ramanarayanan (Educational Testing Service R&D), Klaus Zechner (Educational Testing Service R&D), Keelan Evanini (Educational Testing Service R&D)

Sunday 21:45-23:15(GMT+8), October 25

Tutorial B-1-2  

Tutorial B-2-2  

Neural Approaches to Conversational Information Retrieval

Jianfeng Gao (Microsoft Research, Redmond), Chenyan Xiong (Microsoft Research, Redmond), Paul Bennett (Microsoft Research, Redmond)

Tutorial B-3-2  

Neural Models for Speaker Diarization in the Context of Speech Recognition

Kyu J. Han (ASAPP Inc.), Tae Jin Park (University of Southern California), Dimitrios Dimitriadis (Microsoft, WA)

Tutorial B-4-2  

Spoken Language Processing for Language Learning and Assessment

Vikram Ramanarayanan (Educational Testing Service R&D), Klaus Zechner (Educational Testing Service R&D), Keelan Evanini (Educational Testing Service R&D)

Monday 17:00-18:00(GMT+8), October 26

Opening Session  

Monday 18:00-19:00(GMT+8), October 26

Keynote 1  

The cognitive status of simple and complex models

Janet B. Pierrehumbert (University of Oxford)

Monday 19:15-20:15(GMT+8), October 26

ASR Neural Network Architectures I   Video

Mon-1-1-1 On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

Jinyu Li(Microsoft), Yu Wu(Microsoft Research Asia), Yashesh Gaur(Microsoft), Chengyi Wang(Microsoft Research Asia), rui zhao(microsoft) and Shujie Liu(Microsoft Research Asia)

Mon-1-1-2 SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

Zhifu Gao(Alibaba Group), ShiLiang Zhang(Alibaba Group), Ming Lei(Alibaba Group) and Ian McLoughlin(ICT Cluster, Singapore Institute of Technology)

Mon-1-1-3 CONTEXTUAL RNN-T FOR OPEN DOMAIN ASR

mahaveer jain(facebook), Yatharth Saraf(facebook), Gil Keren(facebook), Jay Mahadeokar(facebook), Geoffrey Zweig(facebook) and Florian Metze(facebook)

Mon-1-1-4 ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

Jing Pan(ASAPP), Joshua Shapiro(ASAPP), Jeremy Wohlwend(ASAPP), Kyu Han(ASAPP), Tao Lei(ASAPP) and Tao Ma(ASAPP)

Mon-1-1-5 Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity

Deepak Kadetotad(Arizona State University / Starkey Hearing Technologies), Jian Meng(Arizona State Unviersity), Visar Berisha(Arizona State University), Chaitali Chakrabarti(Arizona State University) and Jae-sun Seo(Arizona State University)

Mon-1-1-6 BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example

Timo Lohrenz(Technische Universität Braunschweig) and Tim Fingscheidt(Technische Universität Braunschweig)

Mon-1-1-7 Relative Positional Encoding for Speech Recognition and Direct Translation

Ngoc-Quan Pham(Karlsruhe Institute of Technology), Thanh-Le Ha(Karlsruhe Institute of Technology), Tuan Nam Nguyen(Karlsruhe Institute of Technology), Thai Son Nguyen(Karlsruhe Institute of Technology), Elizabeth Salesky(Johns Hopkins University), Sebastian Stüker(Karlsruhe Institute of Technology), Jan Niehues(Maastricht University) and Alexander Waibel(Carnegie Mellon)

Mon-1-1-8 Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers

Naoyuki Kanda(Microsoft), Yashesh Gaur(Microsoft), Xiaofei Wang(Microsoft), Zhong Meng(Microsoft), Zhuo Chen(Microsoft), Tianyan Zhou(Microsoft) and Takuya Yoshioka(Microsoft)

Mon-1-1-10 Effect of Adding Positional Information on Convolutional Neural Networks for End-to-End Speech Recognition

Jinhwan Park(Seoul National University) and Wonyong Sung(Seoul National University)

Multi-Channel Speech Enhancement   Video

Mon-1-2-1 Deep Neural Network-Based Generalized Sidelobe Canceller for Robust Multi-channel Speech Recognition

Guanjun Li(National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences), Shan Liang(NLPR, Institute of Automation, Chinese Academy of Sciences), Shuai Nie(NLPR, Institute of Automation, Chinese Academy of Sciences), Wenju Liu(NLPR, Institute of Automation, Chinese Academy of Sciences), Zhanlei Yang(Huawei Technologies) and Longshuai Xiao(NLPR, Institute of Automation, Chinese Academy of Sciences)

Mon-1-2-2 Neural Spatio-Temporal Beamformer for Target Speech Separation

YONG XU(Tencent AI lab), Meng Yu(Tencent AI lab), Shi-Xiong Zhang(Tencent AI lab), Lianwu Chen(Tencent AI lab), Chao Weng(Tencent AI lab), Jianming Liu(Tencent AI lab) and Dong Yu(Tencent AI lab)

Mon-1-2-3 Online directional speech enhancement using geometrically constrained independent vector analysis

Li Li(University of Tsukuba), Kazuhito Koishida(Microsoft Corporation) and Shoji Makino(University of Tsukuba)

Mon-1-2-4 End-to-End Multi-Look Keyword Spotting

Meng Yu(Tencent AI Lab), Xuan Ji(Tencent AI Lab), Bo Wu(Tencent AI Lab), Dan Su(Tencent AI Lab) and Dong Yu(Tencent AI Lab)

Mon-1-2-5 Differential Beamforming for Uniform Circular Array with Directional Microphones

Weilong Huang(Alibaba group) and Jinwei Feng(Alibaba group)

Mon-1-2-6 Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

Jun Qi(Georgia Institute of Technology), Hu Hu(Georgia Institute of Technology), Yannan Wang(Tencent Corporation), Chao-Han Huck Yang(Georgia Institute of Technology), Sabato Marco Siniscalchi(University of Enna) and Chin-Hui Lee(Georgia Institute of Technology)

Mon-1-2-7 An End-to-end Architecture of Online Multi-channel Speech Separation

Jian Wu(Northwestern Polytechnical University), Zhuo Chen(Microsoft, One Microsoft Way, Redmond, WA, USA), Jinyu Li(Microsoft, One Microsoft Way, Redmond, WA, USA), Takuya Yoshioka(Microsoft, One Microsoft Way, Redmond, WA, USA), Zhili Tan(Microsoft, STCA, Beijing), Ed Lin(Microsoft, STCA, Beijing), Yi Luo(Microsoft, One Microsoft Way, Redmond, WA, USA) and Lei Xie(School of Computer Science, Northwestern Polytechnical University, Xi’an)

Mon-1-2-8 Mentoring-Reverse Mentoring for Unsupervised Multi-channel Speech Source Separation

Yu Nakagome(Waseda Univ.), Masahito Togami(Line Corporation), Tetsuji Ogawa(Waseda University) and Tetsunori Kobayashi(Waseda University)

Mon-1-2-9 Computationally efficient and versatile framework for joint optimization of blind speech separation and dereverberation

Tomohiro Nakatani(NTT Corporation), Rintaro Ikeshita(NTT Corporation), Keisuke Kinoshita(NTT), Hiroshi Sawada(NTT Corporation) and Shoko Araki(NTT Communication Science Laboratories)

Mon-1-2-10 A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-channel Speech Recognition in the CHiME-6 Challenge

Yan-Hui Tu(University of Science and Technology of China), Jun Du(University of Science and Technologoy of China), Lei Sun(University of Science and Technology of China), Feng Ma(University of Science and Technology of China), Jia Pan(University of Science and Technology of China) and Chin-Hui Lee(Georgia Institute of Technology)

Speech Processing in the Brain   Video

Mon-1-3-1 Identifying Causal Relationships Between Behavior and Local Brain Activity During Natural Conversation

Youssef Hmamouche(Aix Marseille University), Laurent Prévot(Aix Marseille Université & CNRS), Magalie Ochs(LIS) and Thierry Chaminade(INT, Aix Marseille Université)

Mon-1-3-2 Neural Entrainment to Natural Speech Envelope Based on Subject Aligned EEG Signals

Di Zhou(Japan Advanced Institute of Science and Technology), Gaoyan Zhang(Tianjin University), Jianwu Dang(JAIST), Shuang Wu(Tianjin University) and Zhuo Zhang(Tianjin University)

Mon-1-3-3 Does Lexical Retrieval Deteriorate in Patients with Mild Cognitive Impairment? Analysis of Brain Functional Network Will Tell

Chongyuan Lian(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Tianqi Wang(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Mingxiao Gu(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Manwa Lawrence Ng(The University of Hong Kong), Feiqi Zhu(Shenzhen Luohu People’s Hospital), Lan Wang(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences) and Nan Yan(Shenzhen Institutes of Advanced Technology)

Mon-1-3-5 Contribution of RMS-level-based speech segments to target speech decoding under noisy conditions

Lei Wang(Southern University of Science and Technology), Ed X. Wu(The University of Hong Kong) and Fei Chen(Southern University of Science and Technology)

Mon-1-3-6 Cortical Oscillatory Hierarchy for Natural Sentence Processing

Bin Zhao(Tianjin University), Jianwu Dang(JAIST), Gaoyan Zhang(Tianjin University) and Masashi Unoki(JAIST)

Mon-1-3-7 Comparing EEG analyses with different epoch alignments in an auditory lexical decision experiment

Louis ten Bosch(Radboud University Nijmegen), Kimberley Mulder(Center for Language Studies, Radboud University, Nijmegen) and Lou Boves(Centre for Language and Speech Technology, Radboud University Nijmegen)

Mon-1-3-8 Detection of Subclinical Mild Traumatic Brain Injury (mTBI) Through Speech and Gait

Tanya Talkar(Harvard University), Sophia Yuditskaya(MIT Lincoln Laboratory), James Williamson(MIT Lincoln Laboratory), Adam Lammert(Worcester Polytechnic Institute), Hrishikesh Rao(MIT Lincoln Laboratory), Daniel Hannon(MIT Lincoln Laboratory), Anne O'Brien(Spaulding Rehabilitation Hospital), Gloria Vergara-Diaz(Spaulding Rehabilitation Hospital), Richard DeLaura(MIT Lincoln Laboratory), Douglas Sturim(MIT), Gregory Ciccarelli(MIT Lincoln Laboratory), Ross Zafonte(Spaulding Rehabilitation Hospital), Jeffrey Palmer(MIT Lincoln Laboratory), Paolo Bonato(Spaulding Rehabilitation Hospital) and Thomas Quatieri(MIT Lincoln Laboratory)

Speech Signal Representation   Video

Mon-1-4-1 Towards Learning a Universal Non-Semantic Representation of Speech

Joel Shor(Google), Aren Jansen(Google), Ronnie Maor(Google), Oran Lang(Google), Omry Tuval(Google), Félix de Chaumont Quitry(Google), Marco Tagliasacchi(Google), Ira Shavitt(Google), Dotan Emanuel(Google) and Yinnon Haviv(Google)

Mon-1-4-2 Poetic Meter Classification Using i-vector-MTF Fusion

Rajeev Rajan(College of Engineering ,Trivandrum), Aiswarya Vinod(College of Engineering,Trivandrum) and Ben P. Babu(RIT Kottayam)

Mon-1-4-3 Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism

Wang Dai(Beijing Language and Culture University), Jinsong Zhang(Beijing Language and Culture University), Yingming Gao(Institute of Acoustics and Speech Communication, Technische Universität Dresden), Wei Wei(Beijing Language and Culture University), Dengfeng Ke(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Binghuai Lin(MIG, Tencent Science and Technology Ltd., Beijing) and Yanlu Xie(Beijing Language and Culture University)

Mon-1-4-4 Automatic Analysis of Speech Prosody in Dutch

Na Hu(Utrecht University), Berit Janssen(Utrecht University), Judith Hanssen(Avans University of Applied Sciences), Carlos Gussenhoven(Radboud University) and Aoju Chen(Utrecht University)

Mon-1-4-5 Learning Voice Representation Using Knowledge Distillation For Automatic Voice Casting

Adrien Gresse(LIA - Avignon University), Mathias Quillot(LIA - Avignon University), Richard Dufour(LIA - Avignon University) and Jean-Francois Bonastre(Avignon University, LIA)

Mon-1-4-6 Enhancing formant information in spectrographic display of speech

Bayya Yegnanarayana(International Institute of Information Technology at Hyderabad), Anand Medabalimi(IIIT Hyderabad) and Vishala Pannala(International Institute of Information Technology Hyderabad)

Mon-1-4-7 Unsupervised Methods for Evaluating Speech Representations

Michael Gump(MIT), Wei-Ning Hsu(Massachusetts Institute of Technology) and James Glass(Massachusetts Institute of Technology)

Mon-1-4-8 Robust pitch regression with voiced/unvoiced classification in nonstationary noise environments

Dung Tran(Microsoft), Uros Batricevic(Microsoft) and Kazuhito Koishida(Microsoft)

Mon-1-4-9 Nonlinear ISA with Auxiliary Variables for Learning Speech Representations

Amrith Setlur(CMU), Barnabas Poczos(Carnegie Mellon University) and Alan W Black(Carnegie Mellon University)

Mon-1-4-10 Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals

Hirotoshi Takeuchi(University of Tokyo), Kunio Kashino(NTT Corporation), Yasunori Ohishi(NTT Corporation) and Hiroshi Saruwatari(The University of Tokyo)

Speech Synthesis: Neural Waveform Generation I   Video

Mon-1-5-1 Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Yang Ai(University of Science and Technology of China) and Zhenhua Ling(University of Science and Technology of China)

Mon-1-5-2 FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

Qiao Tian(Tencent), Zewang Zhang(Tencent), Heng Lu(Tencent), Ling-Hui Chen(Tencent) and Shan Liu(Tencent)

Mon-1-5-3 VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Jinhyeok Yang(NCSOFT), Junmo Lee(NCSOFT), Young-Ik Kim(Researcher), HOON-YOUNG CHO(NCSOFT, AI Center, Speech Lab) and Injung Kim(Handong Global University)

Mon-1-5-4 Lightweight LPCNet-based Neural Vocoder with Tensor Decomposition

Hiroki Kanagawa(NTT Corporation) and Yusuke Ijima(NTT corporation)

Mon-1-5-5 WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

Po-chun Hsu(College of Electrical Engineering and Computer Science, National Taiwan University) and Hung-yi Lee(National Taiwan University (NTU))

Mon-1-5-6 What the future brings: investigating the impact of lookahead for incremental neural TTS

Brooke Stephenson(Université Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble and LIG, UGA, G-INP, CNRS, INRIA, Grenoble, France), Laurent Besacier(LIG), Laurent Girin(GIPSA-lab / University of Grenoble) and Thomas Hueber(CNRS / GIPSA-lab)

Mon-1-5-7 Fast and lightweight on-device TTS with Tacotron2 and LPCNet

Vadim Popov(Huawei Technologies Co. Ltd.), Stanislav Kamenev(Huawei Technologies Co. Ltd.), Mikhail Kudinov(Huawei Technologies Co. Ltd.), Sergey Repyevsky(Huawei Technologies Co. Ltd.), Tasnima Sadekova(Huawei Technologies Co. Ltd.), Vitalii Bushaev(Huawei Technologies Co. Ltd.), Vladimir Kryzhanovskiy(Huawei Technologies Co. Ltd.) and Denis Parkhomenko(Huawei Technologies Co. Ltd.)

Mon-1-5-8 Efficient WaveGlow: An Improved WaveGlow Vocoder with Enhanced Speed

Wei Song(JD AI Research), Guanghui Xu(JD AI Research), Zhengchen Zhang(JD.com), Chao Zhang(University of Cambridge), Xiaodong He(JD AI Research) and Bowen Zhou(JD AI Research)

Mon-1-5-9 Can Auditory Nerve models tell us what’s different about WaveNet vocoded speech?

Sébastien Le Maguer(Adapt Centre / Trinity College Dublin) and Naomi Harte(Trinity College Dublin)

Mon-1-5-10 Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions

Dipjyoti Paul(Computer Science Department, University of Crete, Greece), Yannis Pantazis(Institute of Applied and Computational Mathematics, FORTH) and Yannis Stylianou(Univ of Crete)

Mon-1-5-11 Neural Homomorphic Vocoder

Zhijun Liu(Shanghai Jiao Tong University), Kuan Chen(Shanghai Jiao Tong University) and Kai Yu(Shanghai Jiao Tong University)

Automatic Speech Recognition for Non-Native Childrens Speech   Video

Mon-SS-1-6-1 Overview of the Interspeech TLT2020 Shared Task on ASR for Non-Native Children's Speech

Roberto Gretter(FBK), Marco Matassoni(Fondazione Bruno Kessler), Falavigna Daniele(Fondazione Bruno Kessler), Keelan Evanini(Educational Testing Service) and Chee Wee (Ben) Leong(Educational Testing Service)

Mon-SS-1-6-2 The NTNU System at the Interspeech 2020 Non-Native Children’s Speech ASR Challenge

Tien-Hong Lo(National Taiwan Normal University), Fu-An Chao(National Taiwan Normal University), Shi-Yan Weng(National Taiwan Normal Unversity) and Berlin Chen(National Taiwan Normal University)

Mon-SS-1-6-3 Non-Native Children's Automatic Speech Recognition: the INTERSPEECH 2020 Shared Task ALTA Systems

Kate Knill(University of Cambridge), Linlin Wang(Cambridge University Engineering Department), Yu Wang(University of Cambridge), Xixin Wu(University of Cambridge) and Mark Gales(Cambridge University)

Mon-SS-1-6-4 Data augmentation using prosody and false starts to recognize non-native children's speech

Hemant Kathania(Aalto University), Mittul Singh(Aalto University), Tamás Grósz(Department of Signal Processing and Acoustics, Aalto University) and Mikko Kurimo(Aalto University)

Mon-SS-1-6-5 UNSW System Description for the Shared Task on Automatic Speech Recognition for Non-Native Children’s Speech

Mostafa Shahin(University of New South Wales), Renée Lu(University of New South Wales), Julien Epps(University of New South Wales) and Beena Ahmed(University of New South Wales)

Speaker Diarization   Video

Mon-1-7-1 End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

Shota Horiguchi(Hitachi, Ltd.), Yusuke Fujita(Hitachi, Ltd.), Shinji Watanabe(Johns Hopkins University), Yawen Xue(Hitachi, Ltd.) and Kenji Nagamatsu(Hitachi, Ltd.)

Mon-1-7-2 Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

Ivan Medennikov(STC-innovations Ltd), Maxim Korenevsky(Speech Technology Center), Tatiana Prisyach(STC-innovations Ltd), Yuri Khokhlov(STC-innovations Ltd), Mariya Korenevskaya(STC-innovations Ltd), Ivan Sorokin(STC), Tatiana Timofeeva(STC-innovations Ltd), Anton Mitrofanov(STC-innovations Ltd), Andrei Andrusenko(ITMO University), Ivan Podluzhny(STC-innovations Ltd), Aleksandr Laptev(ITMO University) and Aleksei Romanenko(ITMO University)

Mon-1-7-3 New advances in speaker diarization

Hagai Aronowitz(IBM Research - Haifa), Weizhong Zhu(IBM T.J. Watson Research Center), Masayuki Suzuki(IBM Research), Gakuto Kurata(IBM Research) and Ron Hoory(IBM Haifa Research Lab)

Mon-1-7-4 Self-Attentive Similarity Measurement Strategies in Speaker Diarization

Qingjian Lin(SEIT, Sun Yat-sen University), Yu Hou(Duke Kunshan University) and Ming Li(Duke Kunshan University)

Mon-1-7-5 Speaker attribution with voice profiles by graph-based semi-supervised learning

Jixuan Wang(University of Toronto), Xiong Xiao(Microsoft), Jian Wu(Microsoft), Ranjani Ramamurthy(Microsoft), Frank Rudzicz(University of Toronto) and Michael Brudno(University of Toronto)

Mon-1-7-6 Deep Self-Supervised Hierarchical Clustering for Speaker Diarization

Prachi Singh(Indian Institute of Science, Bangalore) and Sriram Ganapathy(Indian Institute of Science, Bangalore, India, 560012)

Mon-1-7-7 Spot the conversation: speaker diarisation in the wild

Joon Son Chung(University of Oxford), Jaesung Huh(Naver Corporation), Arsha Nagrani(University of Oxford), Triantafyllos Afouras(University of Oxford) and Andrew Zisserman(University of Oxford)

Noise Robust and Distant Speech Recognition   Video

Mon-1-8-1 Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition

Wangyou Zhang(Shanghai Jiao Tong University) and Yanmin Qian(Shanghai Jiao Tong University)

Mon-1-8-2 Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition

Zhihao Du(Harbin Institute of Technology), Jiqing Han(Harbin Institute of Technology) and Xueliang Zhang(Inner Mongolia University)

Mon-1-8-3 Anti-aliasing regularization in stacking layers

Antoine Bruguier(Google), Ananya Misra(Google), Arun Narayanan(Google Inc.) and Rohit Prabhavalkar(Google)

Mon-1-8-4 Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

Andrei Andrusenko(ITMO University), Aleksandr Laptev(ITMO University) and Ivan Medennikov(STC-innovations Ltd)

Mon-1-8-5 End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming

Wangyou Zhang(Shanghai Jiao Tong University), Aswin Shanmugam Subramanian(Johns Hopkins University), Xuankai Chang(Johns Hopkins University), Shinji Watanabe(Johns Hopkins University) and Yanmin Qian(Shanghai Jiao Tong University)

Mon-1-8-6 Quaternion Neural Networks for Multi-channel Distant Speech Recognition

Xinchi Qiu(University of Oxford), Titouan parcollet(University of Oxford), Mirco Ravanelli(Université de Montréal), Nicholas Lane(University of Oxford) and Mohamed Morchid(University of Avignon)

Mon-1-8-7 Improved Guided Source Separation Integrated with a Strong Back-end for the CHiME-6 Dinner Party Scenario

Hangting Chen(Institute of Acoustics,Chinese Academy of Sciences), Pengyuan Zhang(Institute of Acoustics,Chinese Academy of Sciences), Qian Shi(Institute of Acoustics,Chinese Academy of Sciences) and Zuozhen Liu(Institute of Acoustics,Chinese Academy of Sciences)

Mon-1-8-8 Neural Speech Separation Using Spatially Distributed Microphones

Dongmei Wang(Microsoft), Zhuo Chen(Microsoft) and Takuya Yoshioka(Microsoft)

Mon-1-8-9 Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

Shota Horiguchi(Hitachi, Ltd.), Yusuke Fujita(Hitachi, Ltd.) and Kenji Nagamatsu(Hitachi, Ltd.)

Speech in Multimodality (MULTIMODAL)   Video

Mon-1-9-1 Toward Silent Paralinguistics: Speech-to-EMG – Retrieving Articulatory Muscle Activity from Speech

Catarina Botelho(INESC-ID/Instituto Superior Técnico, University of Lisbon, Portugal), Lorenz Diener(University of Bremen), Dennis Küster(Cognitive Systems Lab (CSL), University of Bremen), Kevin Scheck(Cognitive Systems Lab (CSL), University of Bremen), Shahin Amiriparian(University of Augsburg), Björn Schuller(University of Augsburg / Imperial College London), Tanja Schultz(Universität Bremen), Alberto Abad(INESC-ID/IST) and Isabel Trancoso(INESC-ID / IST Univ. Lisbon)

Mon-1-9-2 Multimodal Deception Detection using Automatically Extracted Acoustic, Visual, and Lexical Features

Jiaxuan Zhang(Columbia University), Sarah Ita Levitan(Columbia University) and Julia Hirschberg(Columbia University)

Mon-1-9-3 Multi-modal Attention for Speech Emotion Recognition

Zexu Pan(National University of Singapore), Zhaojie Luo(Osaka University), Jichen Yang(National University of Singapore) and Haizhou Li(National University of Singapore)

Mon-1-9-4 WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition

Guang Shen(Harbin Engineering University), Riwei Lai(Harbin Engineering University), Rui Chen(Harbin Engineering University), Yu Zhang(Southern University of Science and Technology), Kejia Zhang(Harbin Engineering University), Qilong Han(Harbin Engineering University) and Hongtao Song(Harbin Engineering University)

Mon-1-9-5 A Multi-scale Fusion Framework for Bimodal Speech Emotion Recognition

Ming Chen(Zhejiang University) and Xudong Zhao(Hithink RoyalFlush Information Network Co., Ltd.)

Mon-1-9-6 Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition

Pengfei Liu(SpeechX Limited), Kun Li(SpeechX Limited) and Helen Meng(The Chinese University of Hong Kong)

Mon-1-9-7 Multi-modal embeddings using multi-task learning for emotion recognition

Aparna Khare(Amazon.com), Srinivas Parthasarathy(Amazon) and Shiva Sundaram(Amazon)

Mon-1-9-8 Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network

Jeng-Lin Li(Department of Electrical Engineering, National Tsing Hua University) and Chi-Chun Lee(Department of Electrical Engineering, National Tsing Hua University)

Mon-1-9-9 Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition

Zheng Lian(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Bin Liu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Jian Huang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Zhanlei Yang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing) and Rongjun Li(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing)

Speech, Language, and Multimodal Resources   Video

Mon-1-10-1 ATCSpeech: a Multilingual pilot-controller Speech Corpus from Real Air Traffic Control Environment

Bo Yang(Sichuan University), Xianlong Tan(Southwest Air Traffic Management Bureau, Civil Aviation Administration of China), Zhengmao Chen(Sichuan University), Bing Wang(Southwest Air Traffic Management Bureau, Civil Aviation Administration of China), Min Ruan(Southwest Air Traffic Management Bureau, Civil Aviation Administration of China), Dan Li(Southwest Air Traffic Management Bureau, Civil Aviation Administration of China), Zhongping Yang(Wisesoft Co. Ltd.), Xiping Wu(Sichuan University) and Yi LIN(Sichuan University)

Mon-1-10-2 Developing an Open-Source Corpus of Yoruba Speech

Alexander Gutkin(Google), Isin Demirsahin(Google Research), Oddur Kjartansson(Google Research), Clara Rivera(Google Research) and Kọ́lá Túbọ̀sún(Chevening Research Fellow at British Library)

Mon-1-10-3 ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers

Jung-Woo Ha(Clova AI Research, NAVER Corp.), Kihyun Nam(Hankuk University of Foreign Stuides), Jin Gu Kang(Clova AI, NAVER Corp.), Sang-Woo Lee(Clova AI Research, NAVER Corp.), Sohee Yang(Clova AI, NAVER Corp.), Hyunhoon Jung(Clova AI, NAVER Corp.), Hyeji Kim(Clova AI, NAVER Corp.), Eunmi Kim(Clova AI, NAVER Corp.), Soojin Kim(Clova AI, NAVER Corp.), Hyun Ah Kim(Clova AI, NAVER Corp.), Kyoungtae Doh(Clova AI, NAVER Corp.), Chan Kyu Lee(Clova AI, NAVER Corp.), Nako Sung(Clova AI, NAVER Corp.) and Sunghun Kim(Clova AI, NAVER Corp.;The Hong Kong University of Science and Technology)

Mon-1-10-4 LAIX Corpus of Chinese Learner English Towards A Benchmark for L2 English ASR

Huan Luan(LAIX), Jiahong Yuan(LAIX), Hui Lin(LAIX) and Yanhong Wang(LAIX)

Mon-1-10-6 CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment

Si-Ioi Ng(The Chinese University of Hong Kong), Cymie Wing-Yee Ng(The Chinese University of Hong Kong), Jiarui Wang(The Chinese University of Hong Kong), Tan Lee(The Chinese University of Hong Kong), Kathy Yuet-Sheung Lee(The Chinese University of Hong Kong) and Michael Chi-Fai Tong(The Chinese University of Hong Kong)

Mon-1-10-7 FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics

Katri Leino(Aalto University), Juho Leinonen(Aalto University), Mittul Singh(Aalto University), Sami Virpioja(University of Helsinki) and Mikko Kurimo(Aalto University)

Mon-1-10-8 DiPCo - Dinner Party Corpus

Maarten Van Segbroeck(Amazon), Ahmed Zaid(Apple), Ksenia Kutsenko(Amazon), Cirenia Huerta(Amazon), Tinh Nguyen(Amazon), Xuewen Luo(Amazon), Bjorn Hoffmeister(Apple), Jan Trmal(Johns Hopkins University), Maurizio Omologo(Fondazione Bruno Kessler - irst) and Roland Maas(Amazon.com)

Mon-1-10-9 Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews

Bo Wang(University of Oxford), Yue Wu(University of Oxford), Niall Taylor(University of Oxford), Terry Lyons(University of Oxford), Maria Liakata(The Alan Turing Institute), Alejo J Nevado-Holgado(University of Oxford) and Kate Saunders(University of Oxford)

Mon-1-10-10 FT Speech: Danish Parliament Speech Corpus

Andreas Søeborg Kirkedal(Interactions), Marija Stepanović(IT University of Copenhagen) and Barbara Plank(IT University of Copenhagen)

Language Recognition   Video

Mon-1-11-1 Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition

Raphaël Duroselle(Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy), Denis Jouvet(LORIA - INRIA) and Irina Illina(LORIA/INRIA)

Mon-1-11-2 The XMUSPEECH System for AP19-OLR Challenge

Zheng Li(Xiamen University), Miao Zhao(Xiamen University), Jing Li(Xiamen University), Yiming Zhi(Xiamen University), Lin Li(Xiamen University) and Qingyang Hong(Xiamen University)

Mon-1-11-3 On the Usage of Multi-feature Integration for Speaker Verification and Language Identification

Zheng Li(Xiamen University), Miao Zhao(Xiamen University), Jing Li(Xiamen University), Lin Li(Xiamen University) and Qingyang Hong(Xiamen University)

Mon-1-11-4 What does an End-to-End Dialect Identification Model Learn about Non-dialectal Information?

Shammur Absar Chowdhury(University of Trento), Ahmed Ali(Qatar Computing Research Institute), Suwon Shon(Massachusetts Institute of Technology) and James Glass(Massachusetts Institute of Technology)

Mon-1-11-5 Releasing a toolkit and comparing the performance of language embeddings across various spoken language identification datasets

Matias Lindgren(Aalto University), Tommi Jauhiainen(University of Helsinki) and Mikko Kurimo(Aalto University)

Mon-1-11-6 Learning Intonation Pattern Embeddings for Arabic Dialect Identification

Aitor Arronte Alvarez(Center for Language and Technology, University of Hawaii. Technical University of Madrid) and Elsayed Issa(University of Arizona)

Mon-1-11-7 Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages

Badr Abdullah(Saarland University), Tania Avgustinova(Saarland University), Bernd Möbius(Saarland University) and Dietrich Klakow(dietrich.klakow@lsv.uni-saarland.de)

Speech Processing and Analysis   Video

Mon-S&T-1-1 ICE-Talk: an Interface for a Controllable Expressive Talking Machine

No'e Tits(University of Mons), Kevin El Haddad(University of Mons), Thierry Dutoit(University of Mons)

Mon-S&T-1-2 Kaldi-web: An installation-free, on-device speech recognition system

Mathieu Hu(Universit’e de Lorraine), Laurent Pierron(Universit´e de Lorraine), Emmanuel Vincent(Universit´e de Lorraine), Denis Jouvet(Universit´e de Lorraine)

Mon-S&T-1-3 Soapbox Labs Verification Platform for child speech

Amelia C. Kelly(SoapBox Labs), Eleni Karamichali(SoapBox Labs), Armin Saeb(SoapBox Labs), Karel Vesel´y(SoapBox Labs), Nicholas Parslow(SoapBox Labs), Agape Deng(SoapBox Labs), Arnaud Letondor(SoapBox Labs), Robert O’Regan(SoapBox Labs), Qiru Zhou(SoapBox Labs)

Mon-S&T-1-4 SoapBox Labs Fluency Assessment Platform for child speech

Amelia C. Kelly(SoapBox Labs), Eleni Karamichali(SoapBox Labs), Armin Saeb(SoapBox Labs), Karel Vesel´y(SoapBox Labs), Nicholas Parslow(SoapBox Labs), Gloria Montoya Gomez(SoapBox Labs), Agape Deng(SoapBox Labs), Arnaud Letondor(SoapBox Labs), Niall Mullally(SoapBox Labs), Adrian Hempel(SoapBox Labs), Robert O’Regan(SoapBox Labs), Qiru Zhou(SoapBox Labs)

Mon-S&T-1-5 CATOTRON–A Neural Text-to-Speech System in Catalan

Baybars K¨ulebi(Col·lectivaT), Alp ¨Oktem(Col·lectivaT), Alex Peir´o-Lilja(Universitat Pompeu Fabra), Santiago Pascual(Universitat Polit`ecnica de Catalunya), Mireia Farr´us(Universitat Pompeu Fabra)

Mon-S&T-1-6 Toward Remote Patient Monitoring of Speech, Video, Cognitive and Respiratory Biomarkers Using Multimodal Dialog Technology

Vikram Ramanarayanan(Modality.ai), Oliver Roesler(Modality.ai), Michael Neumann(Modality.ai), David Pautler(Modality.ai), Doug Habberstad(Modality.ai), Andrew Cornish(Modality.ai), Hardik Kothare(Modality.ai), Vignesh Murali(Modality.ai), Jackson Liscombe(Modality.ai), Dirk Schnelle-Walka(Modality.ai), Patrick Lange(Modality.ai), and David Suendermann-Oeft(Modality.ai)

Mon-S&T-1-7 VoiceID on the fly: A Speaker Recognition System that Learns from Scratch

Baihan Lin(University of Washington), Xinxin Zhang(University of Washington)

Monday 20:30-21:30(GMT+8), October 26

Speech Emotion Recognition I (SER I)   Video

Mon-2-1-1 Enhancing Transferability of Black-box Adversarial Attacks via Lifelong Learning for Speech Emotion Recognition Models

Zhao Ren(University of Augsburg), Jing Han(University of Augsburg), Nicholas Cummins(University of Augsburg) and Björn Schuller(University of Augsburg / Imperial College London)

Mon-2-1-2 End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model

Han Feng(Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto, Japan), Sei Ueno(Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto, Japan) and Tatsuya Kawahara(Kyoto University)

Mon-2-1-3 Improving Speech Emotion Recognition Using Graph Attentive Bi-directional Gated Recurrent Unit Network

Bo-Hao Su(Department of Electrical Engineering, National Tsing Hua University), Chun-Min Chang(Department of Electrical Engineering, National Tsing Hua University), Yun-Shao Lin(Department of Electrical Engineering, National Tsing Hua University) and Chi-Chun Lee(Department of Electrical Engineering, National Tsing Hua University)

Mon-2-1-4 An Investigation of Cross-Cultural Semi-Supervised Learning for Continuous Affect Recognition

Adria Mallol-Ragolta(University of Augsburg), Nicholas Cummins(University of Augsburg) and Björn Schuller(University of Augsburg / Imperial College London)

Mon-2-1-5 Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition

Kusha Sridhar(The University of Texas at Dallas) and Carlos Busso(The University of Texas at Dallas)

Mon-2-1-6 Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Siddique Latif(University of Southern Queensland Australia/Distributed Sensing Systems Group, Data61, CSIRO Australia), Muhammad Asim(Information Technology University, Lahore), Rajib Rana(University of Southern Queensland), Sara Khalifa(Distributed Sensing Systems Group, Data61, CSIRO Australia), Raja Jurdak(Queensland University of Technology (QUT)) and Björn Schuller(University of Augsburg / Imperial College London)

Mon-2-1-7 Speech Emotion Recognition ‘in the wild’ Using an Autoencoder

Vipula Dissanayake(University of Auckland), Haimo Zhang(University of Auckland), Mark Billinghurst(University of Auckland) and Suranga Nanayakkara(University of Auckland)

Mon-2-1-8 Emotion Profile Refinery for Speech Emotion Classification

Shuiyang Mao(The Chinese University of Hong Kong), P. C. Ching(The Chinese University of Hong Kong) and Tan Lee(The Chinese University of Hong Kong)

Mon-2-1-9 Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation

Sung-Lin Yeh(Department of Electrical Engineering, National Tsing Hua University), Yun-Shao Lin(Department of Electrical Engineering) and Chi-Chun Lee(Department of Electrical Engineering, National Tsing Hua University)

ASR Neural Network Architectures and Training I   Video

Mon-2-2-1 FAST AND SLOW ACOUSTIC MODEL

Kshitiz Kumar(Microsoft Corporation), Emilian Stoimenov(Microsoft Corp), Hosam Khalil(Microsoft Corp) and Jian Wu(Microsoft Corp)

Mon-2-2-2 Self-Distillation for Improving CTC-Transformer-based ASR Systems

Takafumi Moriya(NTT Corporation), Tsubasa Ochiai(NTT Communication Science Laboratories), Shigeki Karita(NTT Communication Science Laboratories), Hiroshi Sato(NTT media intelligent laboratory), Tomohiro Tanaka(NTT Corporation), Takanori Ashihara(NTT Corporation), Ryo Masumura(NTT Corporation), Yusuke Shinohara(NTT Corporation) and Marc Delcroix(NTT Communication Science Laboratories)

Mon-2-2-3 Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard

Zoltán Tüske(IBM Research), George Saon(IBM), Kartik Audhkhasi(IBM Research) and Brian Kingsbury(IBM Research)

Mon-2-2-4 Improving Speech Recognition using GAN-based Speech Synthesis and Contrastive Unspoken Text Selection

Zhehuai Chen(Google), Andrew Rosenberg(Google LLC), Yu Zhang(Google), Gary Wang(Simon Fraser University), Bhuvana Ramabhadran(Google) and Pedro Moreno(google inc.)

Mon-2-2-5 PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

Yiwen Shao(Center for Language and Speech Processing,Johns Hopkins University), Yiming Wang(Johns Hopkins University), Dan Povey(Xiaomi, Inc.) and Sanjeev Khudanpur(Johns Hopkins University)

Mon-2-2-6 CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

keyu An(Tsinghua University), Hongyu Xiang(Tsinghua University) and Zhijian Ou(Department of Electronic Engineering, Tsinghua University)

Mon-2-2-7 CTC-synchronous Training for Monotonic Attention Model

Hirofumi Inaguma(Kyoto University), Masato Mimura(Kyoto University) and Tatsuya Kawahara(Kyoto University)

Mon-2-2-8 Continual Learning for Multi-Dialect Acoustic Models

Brady Houston(Amazon) and Katrin Kirchhoff(Amazon)

Mon-2-2-9 SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition

Xingchen Song(Tsinghua University), Zhiyong Wu(Tsinghua University), Yiheng Huang(Tencent AI Lab), Dan Su(Tencent AILab Shenzhen) and Helen Meng(The Chinese University of Hong Kong)

Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation   Video

Mon-2-3-2 Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

Yuan Shangguan(facebook), Katie Knister(Google), Yanzhang He(Google), Ian McGraw(Google) and Francoise Beaufays(Google)

Mon-2-3-3 Statistical Testing on ASR Performance via Blockwise Bootstrap

Zhe Liu(Facebook, Inc.) and Fuchun Peng(Facebook)

Mon-2-3-4 SENTENCE LEVEL ESTIMATION OF PSYCHOLINGUISTIC NORMS USING JOINT MULTIDIMENSIONAL ANNOTATIONS

Anil Ramakrishna(Amazon) and Shrikanth Narayanan(University of Southern California)

Mon-2-3-5 Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System

Kai Fan(Alibaba Group), Bo Li(Alibaba Group), Jiayi Wang(Alibaba Group), Shiliang Zhang(Alibaba Group), Boxing Chen(Alibaba), Niyu Ge(IBM Research) and Zhi-Jie Yan(Microsoft Research Asia)

Mon-2-3-6 Confidence measures in encoder-decoder models for speech recognition

Alejandro Woodward(Universitat Politècnica de Catalunya), Clara Bonnín(Vilynx), Daivid Varas(Vilynx), Issey Masuda(Vilynx), Elisenda Bou-Balust(Vilynx) and Juan Carlos Riveiro(Vilynx)

Mon-2-3-7 Word Error Rate Estimation Without ASR Output: e-WER2

Ahmed Ali(Qatar Computing Research Institute) and Steve Renals(University of Edinburgh)

Mon-2-3-8 An evaluation of manual and semi-automatic laughter annotation

Bogdan Ludusan(Bielefeld University) and Petra Wagner(Universität Bielefeld)

Mon-2-3-9 Understanding Racial Disparities in Automatic Speech Recognition: the case of habitual "be"

Joshua Martin(University of Florida) and Kevin Tang(University of Florida)

Phonetics and Phonology   Video

Mon-2-4-1 Secondary phonetic cues in the production of the nasal short-a system in California English

Georgia Zellou(UC Davis), Rebecca Scarborough(University of Colorado) and Renee Kemp(UC Davis)

Mon-2-4-2 Acoustic properties of strident fricatives at the edges: implications for consonant discrimination

Louis-Marie Lorin(PSL University), Lorenzo Maselli(Scuola Normale Superiore), Leo Varnet (Ecole normale sup'erieure), Maria Giavazzi(PSL University)

Mon-2-4-4 Voicing Distinction of Obstruents in the Hangzhou Wu Chinese Dialect

Yang Yue(University of Chinese Academy of Social Sciences) and Fang Hu(Institute of Linguistics, Chinese Academy of Social Sciences)

Mon-2-4-5 The phonology and phonetics of Kaifeng Mandarin vowels

Lei Wang(East China University of Science and Technology)

Mon-2-4-6 Microprosodic variability in plosives in German and Austrian German

Margaret Zellers(University of Kiel) and Barbara Schuppler(SPSC Laboratory, Graz University of Technology)

Mon-2-4-7 Er-suffixation in Southwestern Mandarin: An EMA and ultrasound study

Jing Huang(National Tsing Hua University), Feng-fan Hsieh(National Tsing Hua University) and Yueh-chin Chang(National Tsing Hua University)

Mon-2-4-9 Modeling Global Body Configurations in American Sign Language

Nicholas Wilkins(Rochester Institute of Technology), Beck Cordes Galbraith(Sign-Speak) and Ifeoma Nwogu(Rochester Institute of Technology)

Topics in ASR I   Video

Mon-2-5-1 Augmenting Turn-taking Prediction with Wearable Eye Activity During Conversation

Hang Li(UNSW), Siyuan Chen(University of New South Wales) and Julien Epps(School of Electrical Engineering and Telecommunications, UNSW Australia)

Mon-2-5-2 CAM: Uninteresting Speech Detector

Weiyi Lu(Amazon), Yi Xu(Amazon), Peng Yang(Amazon) and Belinda Zeng(Amazon)

Mon-2-5-3 Mixed Case Contextual ASR Using Capitalization Masks

Diamantino Caseiro(Google Inc.), Pat Rondon(Google Inc.), Quoc-Nam Le The(Google Inc.) and Petar Aleksic(Google Inc.)

Mon-2-5-4 Speech Recognition and Multi-Speaker Diarization of Long Conversations

Huanru Henry Mao(University of California, San Diego), Shuyang Li(University of California San Diego), Julian McAuley(University of California San Diego) and Garrison Cottrell(University of California, San Diego)

Mon-2-5-5 Investigation of Data Augmentation Techniques for Disordered Speech Recognition

Mengzhe Geng(The Chinese University of Hong Kong), Xurong Xie(Chinese University of Hong Kong), SHANSONG LIU(The Chinese University of Hong Kong), Jianwei Yu(the Chinese University of Hong Kong), shoukang hu(The Chinese University of Hong Kong), Xunying Liu(Chinese University of Hong Kong) and Helen Meng(The Chinese University of Hong Kong)

Mon-2-5-6 A Real-time Robot-based Auxiliary System for Risk Evaluation of COVID-19 Infection

Wenqi Wei(Ping An Technology (Shenzhen) Co., Ltd.), Jianzong Wang(Ping An Technology (Shenzhen) Co., Ltd.), Jiteng Ma(Ping An Technology (Shenzhen) Co., Ltd.), Ning Cheng(Ping An Technology (Shenzhen) Co., Ltd.) and Jing Xiao(Ping An Technology (Shenzhen) Co., Ltd.)

Mon-2-5-7 An Utterance Verification System for Word Naming Therapy in Aphasia

David Barbera(University College London), Mark Huckvale(Speech, Hearing and Phonetic Sciences, University College London), Victoria Fleming(Institute of Cognitive Neuroscience, University College London), Emily Upton(Institute of Cognitive Neuroscience, University College London), Henry Coley-Fisher(Institute of Cognitive Neuroscience, University College London), Ian Shaw(Technical Consultant at SoftV), William Latham(Goldsmiths College University of London), Alexander Paul Leff(Institute of Cognitive Neuroscience, University College London) and Jenny Crinion(Institute of Cognitive Neuroscience, University College London)

Mon-2-5-8 Exploiting Cross Domain Visual Feature Generation for Disordered Speech Recognition

SHANSONG LIU(The Chinese University of Hong Kong), Xurong Xie(Chinese University of Hong Kong), Jianwei Yu(the Chinese University of Hong Kong), shoukang hu(The Chinese University of Hong Kong), Mengzhe Geng(The Chinese University of Hong Kong), Rongfeng Su(Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences.), Shi-Xiong ZHANG(Tencent AI Lab), Xunying Liu(Chinese University of Hong Kong) and Helen Meng(The Chinese University of Hong Kong)

Mon-2-5-9 Joint prediction of punctuation and disfluency in speech transcripts

Binghuai Lin(Tencent Technology Co., Ltd) and Liyuan Wang(Tencent Technology Co., Ltd)

Mon-2-5-10 Focal Loss for Punctuation Prediction

Jiangyan Yi(Institute of Automation Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengkun Tian(Institute of Automation, Chinese Academy of Sciences), Ye Bai(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Cunhang Fan(Institute of Automation, Chinese Academy of Sciences)

Large-Scale Evaluation of Short-Duration Speaker Verification   Video

Mon-SS-2-6-1 Improving X-vector and PLDA for Text-dependent Speaker Verification

Zhuxin Chen(NetEase Games AI Lab) and Yue Lin(NetEase Games AI Lab)

Mon-SS-2-6-2 SdSV Challenge 2020: Large-Scale Evaluation of Short‐Duration Speaker Verification

Hossein Zeinali(Amirkabir University of Technology), Kong Aik Lee(Biometrics Research Laboratories, NEC Corporation), Md Jahangir Alam(Computer Research Institute of Montreal (CRIM)) and Lukas Burget(Brno University of Technology)

Mon-SS-2-6-3 The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020

Tao Jiang(School of Informatics, Xiamen University), Miao Zhao(School of Informatics, Xiamen University), Lin Li(Xiamen University) and Qingyang Hong(Xiamen University)

Mon-SS-2-6-4 Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020

Sung Hwan Mun(Seoul National University), Woo Hyun Kang(Department of Electrical and Computer Engineering and INMC, Seoul National University), Min Hyun Han(Seoul National University) and Nam Soo Kim(Seoul National University)

Mon-SS-2-6-5 The TalTech Systems for the Short-duration Speaker Verification Challenge 2020

Tanel Alumäe(Tallinn University of Technology) and Jörgen Valk(Tallinn University of Technology)

Mon-SS-2-6-7 Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization

Jenthe Thienpondt(IDLab, Department of Electronics and Information Systems, Ghent University - imec, Belgium), Brecht Desplanques(Ghent University - imec, IDLab, Department of Electronics and Information Systems) and Kris Demuynck(Ghent University)

Mon-SS-2-6-8 BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020

Alicia Lozano-Diez(Brno University of Technology), Anna Silnova(Brno University of Technology), Bhargav Pulugundla(Brno University of Technology), Johan Rohdin(Brno University of Technology), Karel Vesely(Brno University of Technology), Lukas Burget(Brno University of Technology), Oldrich Plchot(Brno University of Technology), Ondrej Glembek(Brno University of Technology), Ondrej Novotny(Brno University of Technology) and Pavel Matejka(Brno University of Technology)

Mon-SS-2-6-9 Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification

Vijay Ravi(Ph.D. Student, UCLA), Ruchao Fan(University of California, Los Angeles), Amber Afshan(University of California, Los Angeles), Huanhua Lu(UCLA) and Abeer Alwan(UCLA)

Voice Conversion and Adaptation I   Video

Mon-2-7-1 Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning

Jing-Xuan Zhang(University of Science and Technology of China), Zhen-Hua Ling(University of Science and Technology of China) and Li-Rong Dai(University of Science and Technology of China)

Mon-2-7-2 Improving the Speaker Identity of Non-Parallel Many-to-Many VoiceConversion with Adversarial Speaker Recognition

Shaojin Ding(Texas A&M University), Guanlong Zhao(Texas A&M University) and Ricardo Gutierrez-Osuna(Texas A&M University)

Mon-2-7-3 Non-parallel Many-to-many Voice Conversion with PSR-StarGAN

Yanping Li(Nanjing University of Posts and Telecommunications), Dongxiang Xu(Nanjing University of Posts and Telecommunications), Yan Zhang(JIT), Yang Wang(vivo AI Lab) and Binbin Chen(vivo AI Lab)

Mon-2-7-4 TTS Skins: Speaker Conversion via ASR

Adam Polyak(Facebook), Lior Wolf(Tel Aviv University) and Yaniv Taigman(Facebook)

Mon-2-7-5 GAZEV: GAN-Based Zero Shot Voice Conversion over Non-parallel Speech Corpus

zining zhang(National University of Singapore), Bingsheng He(National University of Singapore) and Zhenjie Zhang(Yitu)

Mon-2-7-6 Spoken Content and Voice Factorization for Few-shot Speaker Adaptation

Tao Wang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Ruibo Fu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Rongxiu Zhong(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)

Mon-2-7-7 Unsupervised Cross-Domain Singing Voice Conversion

Adam Polyak(Facebook), Lior Wolf(Tel Aviv University), Yossi Adi(Facebook AI Research) and Yaniv Taigman(Facebook)

Mon-2-7-8 Attention-Based Speaker Embeddings for One-Shot Voice Conversion

Tatsuma Ishihara(GREE Inc.) and Daisuke Saito(The University of Tokyo)

Mon-2-7-9 Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training

Jian Cong(Northwestern Polytechnical University), Shan Yang(Northwestern Polytechnical University), Lei Xie(Northwestern Polytechnical University), Guoqiao Yu(Meituan-Dianping Group, Beijing) and Guanglu Wan(Meituan-Dianping Group, Beijing)

Acoustic Event Detection   Video

Mon-2-8-1 Gated Multi-head Attention Pooling for Weakly Labelled Audio Tagging

Sixin Hong(Peking University), Yuexian Zou(ADSPLAB, School of ECE, Peking University, Shenzhen) and Wenwu Wang(Center for Vision, Speech and Signal Processing, University of Surrey, UK)

Mon-2-8-2 Environmental Sound Classification with Parallel Temporal-spectral Attention

Helin Wang(Peking University), Yuexian Zou(Peking University Shenzhen Graduate School), dading chong(Peking University ShenZhen Graduate School) and Wenwu Wang(University of Surrey)

Mon-2-8-3 Contrastive Predictive Coding of Audio with an Adversary

Luyu Wang(DeepMind), Kazuya Kawakami(DeepMind) and Aaron van den Oord(DeepMind)

Mon-2-8-4 Memory Controlled Sequential Self Attention for Sound Recognition

Arjun Pankajakshan(Queen Mary University of London), Helen L. Bear(Queen Mary University of London), Vinod Subramanian(Queen Mary University of London) and Emmanouil Benetos(Queen Mary University of London)

Mon-2-8-5 Dual Stage Learning based Dynamic Time-Frequency Mask Generation for Audio Event Classification

Donghyeon Kim(Korea university), Jaihyun Park(Korea University), David Han(US Army Research Laboratory) and Hanseok Ko(Korea University)

Mon-2-8-6 An Effective Perturbation based Semi-Supervised Learning Method for Sound Event Detection

Xu Zheng(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China), Yan Song(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China), Jie Yan(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China), Li-Rong Dai(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China), Ian McLoughlin(ICT Cluster, Singapore Institute of Technology) and Lin Liu(iFLYTEK Research, iFLYTEK CO., LTD, Hefei)

Mon-2-8-7 A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling

Chieh-Chi Kao(Amazon.com), Bowen Shi(Toyota Technological Institute at Chicago), Ming Sun(Amazon.com) and Chao Wang(Amazon.com)

Mon-2-8-8 Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging

Chun-Chieh Chang(Johns Hopkins University), Chieh-Chi Kao(Amazon.com), Ming Sun(Amazon.com) and Chao Wang(Amazon.com)

Mon-2-8-9 Two-stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-token Connectionist Temporal Classification

Inyoung Park(Gwangju Institute of Science and Technology) and Hong Kook Kim(Gwangju Institute of Science and Technology)

Mon-2-8-10 SpeechMix - Augmenting Deep Sound Recognition using Hidden Space Interpolations

Amit Jindal(Manipal Institute of Technology), Narayanan Elavathur Ranganatha(Manipal Academy of Higher Education), Aniket Didolkar(Manipal Institute of Technology), Arijit Ghosh Chowdhury(Manipal Institute of Technology), Di Jin(MIT), Ramit Sawhney(Netaji Subhas Institute of Technology) and Rajiv Ratn Shah(IIIT Delhi)

Spoken Language Understanding I   Video

Mon-2-9-1 End-to-End Neural Transformer Based Spoken Language Understanding

martin radfar(Amazon Inc), Athanasios Mouchtaris(Amazon Inc) and Jimmy Kunnzmann(Amazon Inc)

Mon-2-9-2 Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding

Chen Liu(Shanghai Jiao Tong University), Su Zhu(Shanghai Jiao Tong University), Zijian Zhao(Shanghai Jiao Tong University), Ruisheng Cao(Shanghai Jiao Tong University), Lu Chen(Shanghai Jiao Tong University) and Kai Yu(Shanghai Jiao Tong University)

Mon-2-9-3 Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

Milind Rao(Applied Scientist), Anirudh Raju(Amazon), Pranav Dheram(Amazon Alexa), Bach Bui(Amazon Alexa) and Ariya Rastrow(Amazon.com)

Mon-2-9-5 Context Dependent RNNLM for Automatic Transcription of Conversations

Srikanth Raj Chetupalli(Indian Institute of Science, Bangalore) and Sriram Ganapathy(Indian Institute of Science, Bangalore, India, 560012)

Mon-2-9-6 Improving End-to-End Speech-to-Intent Classification with Reptile

Yusheng Tian(Huawei Noah’s Ark Lab, London) and Philip John Gorinski(Huawei Noah's Ark Lab)

Mon-2-9-7 Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

Won Ik Cho(Department of Electrical and Computer Engineering and INMC, Seoul National University), Donghyun Kwak(Search Solution Inc.), Jiwon Yoon(Department of Electrical and Computer Engineering and INMC, Seoul National University) and Nam Soo Kim(Seoul National University)

Mon-2-9-8 Towards an ASR error robust Spoken Language Understanding System

Weitong Ruan(Amazon Alexa), Yaroslav Nechaev(Amazon Alexa), Luoxin Chen(Amazon Alexa), Chengwei Su(Amazon Alexa) and Imre Kiss(Amazon Alexa)

Mon-2-9-9 End-to-End Spoken Language Understanding Without Full Transcripts

Hong-Kwang Kuo(IBM T. J. Watson Research Center), Zoltán Tüske(IBM Research), Samuel Thomas(IBM Research AI), Yinghui Huang(IBM), Kartik Audhkhasi(IBM Research), Brian Kingsbury(IBM Research), Gakuto Kurata(IBM Research), Zvi Kons(IBM Haifa research lab), Ron Hoory(IBM Haifa Research Lab) and Luis Lastras(IBM Research AI)

Mon-2-9-10 Are Neural Open-Domain Dialog Systems Robust to Speech Recognition Errors in the Dialog History? An Empirical Study

Karthik Gopalakrishnan(Amazon Alexa AI), Behnam Hedayatnia(Amazon), Longshaokan Wang(Amazon), Yang Liu(Amazon) and Dilek Hakkani-Tur(Amazon Alexa AI)

DNN Architectures for Speaker Recognition   Video

Mon-2-10-1 AutoSpeech: Neural Architecture Search for Speaker Recognition

Shaojin Ding(Texas A&M University), Tianlong Chen(Texas A&M University), Xinyu Gong(Texas A&M University), Weiwei Zha(University of Science and Technology of China) and Zhangyang Wang(Texas A&M University)

Mon-2-10-2 Densely Connected Time Delay Neural Network for Speaker Verification

Ya-Qi Yu(Nanjing University) and Wu-Jun Li(Nanjing University)

Mon-2-10-3 Phonetically-Aware Coupled Network For Short Duration Text-independent Speaker Verification

Siqi Zheng(Alibaba), Hongbin Suo(Alibaba Group) and Yun Lei(Alibaba Group)

Mon-2-10-5 Vector-based attentive pooling for text-independent speaker verification

Yanfeng Wu(Nankai University), Chenkai Guo(Nankai University), Hongcan Gao(Nankai University), Xiaolei Hou(Nankai University) and Jing Xu(Nankai University)

Mon-2-10-6 self-attention encoding and pooling for speaker recognition

pooyan safari(TALP research center, BarcelonaTech.), Miquel India(Universitat Politecnica de Catalunya) and Javier Hernando(Universitat Politecnica de Catalunya)

Mon-2-10-7 ARET: Aggregated Residual Extended Time-delay Neural Networks for Speaker Verification

Ruiteng Zhang(Tianjin University), Jianguo Wei(Tianjin University), Wenhuan Lu(Tianjin University), Longbiao Wang(Tianjin University), Meng Liu(Tianjin University), Lin Zhang(Tianjin University), Jiayu Jin(Tianjin University) and Junhai Xu(Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University)

Mon-2-10-8 Adversarial Separation Network for Speaker Recognition

Hanyi Zhang(Yunnan University), Longbiao Wang(Tianjin University), Yunchun Zhang(Yunnan University), Meng Liu(Tianjin University), Kong Aik Lee(Biometrics Research Laboratories, NEC Corporation) and Jianguo Wei(Tianjin University)

Mon-2-10-9 Text-Independent Speaker Verification with Dual Attention Network

Jingyu Li(The Chinese University of Hong Kong) and Tan Lee(The Chinese University of Hong Kong)

Mon-2-10-10 Evolutionary Algorithm Enhanced Neural Architecture Search for Text-Independent Speaker Verification

Xiaoyang Qu(Ping An Technology (shenzhen)Co., Ltd), Jianzong Wang(Ping An Technology (Shenzhen) Co., Ltd.) and Jing Xiao(Ping An Technology)

ASR Model Training and Strategies   Video

Mon-2-11-1 Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition

Chao Weng(Tencent AI Lab), Chengzhu Yu(Tencent), Jia Cui(Tencent), Chunlei Zhang(Tencent AI Lab) and Dong Yu(Tencent AI Lab)

Mon-2-11-2 Semantic Mask for Transformer based End-to-End Speech Recognition

Chengyi Wang(Nankai University), Yu Wu(Microsoft Research Asia), Yujiao Du(Alibaba Corporation), Jinyu Li(Microsoft), Shujie Liu(Microsoft Research Asia, Beijing), Liang Lu(Microsoft), Shuo Ren(Beihang University), Guoli Ye -(Microsoft), Sheng Zhao(Microsoft) and Ming Zhou(microsoft research asia)

Mon-2-11-3 Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

Frank Zhang(Facebook AI, USA), Yongqiang Wang(Facebook AI, USA), Xiaohui Zhang(Facebook AI, USA), Chunxi Liu(Facebook AI, USA), Yatharth Saraf(Facebook AI, USA) and Geoffrey Zweig(Facebook AI, USA)

Mon-2-11-4 A Federated Approach in Training Acoustic Models

Dimitrios Dimitriadis(Microsoft), Kenichi Kumatani(Amazon Inc.), Robert Gmyr(Microsoft), Yashesh Gaur(Microsoft) and Sefik Emre Eskimez(Microsoft)

Mon-2-11-5 On Semi-Supervised LF-MMI Training of Acoustic Models with Limited Data

Imran Sheikh(Inria), Emmanuel Vincent(Inria) and Irina Illina(LORIA/INRIA)

Mon-2-11-6 On Front-end Gain Invariant Modeling for Wake Word Spotting

Yixin Gao(Amazon), Noah D. Stein(Amazon), Chieh-Chi Kao(Amazon), Yunliang Cai(Amazon), Ming Sun(Amazon), Tao Zhang(Amazon) and Shiv Vitaladevuni(Amazon)

Mon-2-11-7 Unsupervised Regularization-Based Adaptive Training for Speech Recognition

Fenglin Ding(University of Science and Technology of China), Wu Guo(university of science and technology of china), Bin Gu(University of Science and Technology of China), Zhenhua Ling(University of Science and Technology of China) and Jun Du(University of Science and Technology of China)

Mon-2-11-8 On the Robustness and Training Dynamics of Raw Waveform Models

Erfan Loweimi(The University of Edinburgh), Peter Bell(University of Edinburgh) and Steve Renals(University of Edinburgh)

Mon-2-11-9 Iterative Pseudo-Labeling for Speech Recognition

Qiantong Xu(Facebook), Tatiana Likhomanenko(Facebook AI Research), Jacob Kahn(Facebook AI Research), Awni Hannun(Facebook AI Research), Gabriel Synnaeve(Facebook AI Research) and Ronan Collobert(Facebook AI Research)

Monday 20:15-21:15(GMT+8), October 26

Speech Annotation and Speech Assessment   Video

Mon-S&T-2-1 Smart Tube: A Biofeedback System for Vocal Training and Therapy through Tube Phonation

Naoko Kawamura(Himeji Dokkyo University), Tatsuya Kitamura(Konan University), Kenta Hamada(Konan University)

Mon-S&T-2-2 VCTUBE: A Library for Automatic Speech Data Annotation

Seong Choi(Sungkyunkwan University), Seunghoon Jeong(Hanyang University), Jeewoo Yoon(Sungkyunkwan University), Migyeong Yang(Sungkyunkwan University), Minsam Ko(Hanyang University), Eunil Park(Sungkyunkwan University), Jinyoung Han(Sungkyunkwan University), Munyoung Lee(Electronics and Telecommunications Research Institute), Seonghee Lee(Electronics and Telecommunications Research Institute)

Mon-S&T-2-3 A Mandarin L2 Learning APP with Mispronunciation Detection and Feedback

Yanlu Xie(Beijing Language and Culture University), Xiaoli Feng(Beijing Language and Culture University), Boxue Li(Beijing Language and Culture University), Jinsong Zhang(Beijing Language and Culture University), Yujia Jin(Beijing Language and Culture University)

Mon-S&T-2-4 Rapid Enhancement of NLP systems by Acquisition of Data in Correlated Domains

Tejas Udayakumar(Samsung Research and Development Institute) , Kinnera Saranu(Samsung Research and Development Institute), Mayuresh Sanjay Oak(Samsung Research and Development Institute), Ajit Ashok Saunshikar(Samsung Research and Development Institute), Sandip Shriram Bapat(Samsung Research and Development Institute)

Mon-S&T-2-5 Computer-Assisted Language Learning System: Automatic Speech Evaluation for Singapore an Children Learning Malay and Tamil

Ke Shi(Institute for Infocomm Research) Kye Min Tan(Institute for Infocomm Research), Siti Umairah Md Salleh(Institute for Infocomm Research), Nur Farah Ain Binte Suhaimi(Institute for Infocomm Research), Rajan s/o Vellu(Institute for Infocomm Research), Thai Ngoc Thuy Huong Helen(Institute for Infocomm Research), Nancy F. Chen(Institute for Infocomm Research)

Mon-S&T-2-6 Real-time, Full-band, Online DNN-based Voice Conversion System Using A Single CPU

Takaaki Saeki(University of Tokyo), Yuki Saito(University of Tokyo), Shinnnosuke Takamichi(University of Tokyo), and Hiroshi Saruwatari(University of Tokyo)

Mon-S&T-2-7 A Dynamic 3D Pronunciation Teaching Model based on Pronunciation Attributes and Anatomy

Xiaoli Feng(Beijing Language and Culture University), Yanlu Xie(Beijing Language and Culture University) , Yayue Deng(Beijing Language and Culture University), Boxue Li(Yunfan Hailiang technology co.)

Mon-S&T-2-8 End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge

Naoki Kimura(University of Tokyo), Zixiong Su(University of Tokyo), Takaaki Saeki(University of Tokyo)

Monday 21:45-22:45(GMT+8), October 26

Cross/Multi-Lingual and Code-Switched Speech Recognition   Video

Mon-3-1-1 Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Jialu Li(University of Illinois at Urbana-Champaign) and Mark Hasegawa-Johnson(University of Illinois)

Mon-3-1-2 Development of Multilingual ASR Using GlobalPhone for Less-Resourced Languages: The Case of Ethiopian Languages

Martha Yifiru Tachbelie(Addis Ababa University), Solomon Teferra Abate(Addis Ababa University) and Tanja Schultz(Universität Bremen)

Mon-3-1-3 Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning

Wenxin Hou(Tokyo Institute of Technology), Yue Dong(Tokyo Institute of Technology), Bairong Zhuang(Tokyo Institute of Technology), Longfei Yang(Tokyo Institute of Technology), Jiatong Shi(Johns Hopkins University) and Takahiro Shinozaki(Tokyo Institute of Technology)

Mon-3-1-4 Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

Xinyuan Zhou(Shanghai Normal University), Emre Yilmaz(National University of Singapore), Yanhua Long(Shanghai Normal University), Yijie Li(Unisound AI Technology Co., Ltd.) and Haizhou Li(National University of Singapore)

Mon-3-1-5 Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages

Solomon Teferra Abate(Addis Ababa University), Martha Yifiru Tachbelie(Addis Ababa University) and Tanja Schultz(Universität Bremen)

Mon-3-1-6 Multilingual Jointly Trained Acoustic and Written Word Embeddings

Yushi Hu(University of Chicago), Shane Settle(Toyota Technological Institute at Chicago) and Karen Livescu(TTI-Chicago)

Mon-3-1-7 Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

Chia-Yu Li(Institute of Natural Language Processing, University of Stuttgart, Germany) and Ngoc Thang Vu(University of Stuttgart)

Mon-3-1-8 Data Augmentation for Code-switch Language Modeling by Fusing Multiple Text Generation Methods

Xinhui Hu(Hithink Flush Information Network Co Ltd), Qi Zhang(Hithink RoyaFlush AI Research Institute), Lei Yang(Hithink RoyaFlush AI Research Institute), Binbin Gu(Hithink RoyaFlush AI Research Institute) and Xinkang Xu(Hithink RoyaFlush AI Research Institute)

Mon-3-1-9 A 43 Language Multilingual Punctuation Prediction Neural Network Model

Xinxing Li(Microsoft China) and Edward Lin(Microsoft China)

Anti-Spoofing and Liveness Detection   Video

Mon-3-2-1 Multi-Task Siamese Neural Network for Improving Replay Attack Detection

Patrick von Platen(University of Cambridge), Fei Tao(Uber AI) and Gokhan Tur(Amazon Alexa AI)

Mon-3-2-2 POCO: a Voice Spoofing and Liveness Detection Corpus based on Pop Noise

Kosuke Akimoto(Data Science Research Laboratories, NEC Corporation), Seng Pei Liew(NEC), Sakiko Mishima(Data Science Research Laboratories, NEC Corporation), Ryo Mizushima(Security Research Laboratories, NEC Corporation) and Kong Aik Lee(Biometrics Research Laboratories, NEC Corporation)

Mon-3-2-3 Dual-adversarial domain adaptation for generalized replay attack detection

Hongji Wang(Shanghai Jiao Tong University), Heinrich Dinkel(Shanghai Jiao Tong University), Shuai Wang(Shanghai Jiao Tong University), Yanmin Qian(Shanghai Jiao Tong University) and Kai Yu(Shanghai Jiao Tong University)

Mon-3-2-4 Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection

Hye-jin Shim(University of Seoul), Hee-Soo Heo(School of Computer Science, University of Seoul, Korea), Jee-weon Jung(University of Seoul) and Ha-Jin Yu(University of Seoul)

Mon-3-2-5 Competency Evaluation in Voice Mimicking Using Acoustic Cues

Abhijith G.(College of Engineering,Trivandrum), Adarsh S.(College of Engineering,Trivandrum), Akshay Prasannan(College of Engineering,Trivandrum) and Rajeev Rajan(College of Engineering ,Trivandrum)

Mon-3-2-6 Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks

Zhenzong Wu(National University of Singapore), Rohan Kumar Das(National University Singapore), Jichen Yang(National University of Singapore) and Haizhou Li(National University of Singapore)

Mon-3-2-7 Spoofing Attack Detection using the Non-linear Fusion of Sub-band Classifiers

Hemlata Tak(Eurecom), Jose Patino(EURECOM), Andreas Nautsch(EURECOM), Nicholas Evans(EURECOM) and Massimiliano Todisco(EURECOM - School of Engineering & Research Center - Digital Security Department)

Mon-3-2-8 Investigating Light-ResNet Architecture for Spoofing Detection under Mismatched Conditions

Prasanth Parasu(University of New South Wales), Julien Epps(School of Electrical Engineering and Telecommunications, UNSW Australia), Kaavya Sriskandaraja(The University of New South Wales) and Gajan Suthokumar(The University of New South Wales)

Mon-3-2-9 Siamese Convolutional Neural Network Using Gaussian Probability Feature for Spoofing Speech Detection

Zhenchun Lei(School of Computer and Information Engineering, Jiangxi Normal University, Nanchang), Yingen Yang(School of Computer and Information Engineering, Jiangxi Normal University, Nanchang), Changhong Liu(School of Computer and Information Engineering, Jiangxi Normal University, Nanchang) and Jihua Ye(School of Computer and Information Engineering, Jiangxi Normal University, Nanchang)

Noise Reduction and Intelligibility   Video

Mon-3-3-1 Lightweight Online Noise Reduction on Embedded Devices using Hierarchical Recurrent Neural Networks

Hendrik Schröeter(Friedrich-Alexander-Universität Erlangen-Nürnberg), Tobias Rosenkranz(Sivantos GmbH, Erlangen), Alberto N. Escalante Banuelos(Sivantos GmbH, Erlangen), Pascal Zobel(Friedrich-Alexander-Universität Erlangen-Nürnberg) and Andreas Maier(Friedrich-Alexander-Universität Erlangen-Nürnberg)

Mon-3-3-2 SEANet: A Multi-modal Speech Enhancement Network

Marco Tagliasacchi(Google Research), Yunpeng Li(Google Research), Karolis Misiunas(Google Research) and Dominik Roblek(Google Research)

Mon-3-3-3 Lite Audio-Visual Speech Enhancement

Shang-Yi Chuang(Academia Sinica), Yu Tsao(Academia Sinica), Chen-Chou Lo(Academia Sinica) and Hsin-Min Wang(Academia Sinica)

Mon-3-3-4 ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication

Christian Bergler(Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab), Manuel Schmitt(Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab), Andreas Maier(University Erlangen-Nuremberg), Simeon Smeele(Max Planck Institute of Animal Behavior, Cognitive and Cultural Ecology Lab and Max Planck Institute for Evolutionary Anthropology, Department for Human Behavior, Ecology and Culture), Volker Barth(Anthro-Media) and Elmar Nöth(Friedrich-Alexander-University Erlangen-Nuremberg)

Mon-3-3-5 A Deep Learning Approach to Active Noise Control

Hao Zhang(The Ohio State University, USA) and DeLiang Wang(Ohio State University)

Mon-3-3-6 Improving Speech Intelligibility through Speaker Dependent and Independent Spectral Style Conversion

Tuan Dinh(OHSU), Alexander Kain(OHSU) and Kris Tjaden(University at Buffalo)

Mon-3-3-7 End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks

Mathias Bach Pedersen(Aalborg University, Denmark), Morten Kolbæk(Aalborg University, Denmark), Asger Heidemann Andersen(Oticon A/S), Søren Holdt Jensen(Aalborg University, Denmark) and Jesper Jensen(Oticon A/S and Aalborg University)

Mon-3-3-8 Predicting Intelligibility of Enhanced Speech Using Posteriors Derived from DNN-based ASR System

Kenichi Arai(NTT Communication SCience Laboratories), Shoko Araki(NTT Communication Science Laboratories), Atsunori Ogawa(NTT Communication Science Laboratories), Keisuke Kinoshita(NTT), Tomohiro Nakatani(NTT Corporation) and Toshio Irino(Wakayama University)

Mon-3-3-9 Automatic Estimation of Intelligibility Measure for Consonants in Speech

Ali Abavisani(University of Illinois Urbana-Champaign) and Mark Hasegawa-Johnson(University of Illinois)

Mon-3-3-10 Large scale evaluation of importance maps in automatic speech recognition

Viet Anh Trinh(The Graduate Center, CUNY, New York, USA) and Michael Mandel(Brooklyn College, CUNY)

Acoustic Scene Classification   Video

Mon-3-4-1 Neural Architecture Search on Acoustic Scene Classification

Jixiang Li(Xiaomi), Chuming Liang(Xiaomi), Bo Zhang(Xiaomi), Zhao Wang(Xiaomi), Fei Xiang(Xiaomi) and Xiangxiang Chu(Xiaomi)

Mon-3-4-2 Acoustic Scene Classification using Audio Tagging

Jee-weon Jung(University of Seoul), Hye-jin Shim(University of Seoul), Ju-ho Kim(University of Seoul), Seung-bin Kim(University of Seoul) and Ha-Jin Yu(University of Seoul)

Mon-3-4-3 ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification

Liwen Zhang(Harbin Institute of Technology), Jiqing Han(Harbin Institute of Technology) and Ziqiang Shi(Fujitsu Research and Development Center)

Mon-3-4-4 Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

Jivitesh Sharma(University of Agder), Ole-Christoffer Granmo(University of Agder) and Morten Goodwin(University of Agder)

Mon-3-4-5 Acoustic Scene Analysis with Multi-head Attention Networks

Weimin Wang(Amazon), Weiran Wang(Amazon.com), Ming Sun(Amazon.com) and Chao Wang(Amazon)

Mon-3-4-6 Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification

Hu Hu(Georgia Institute of Technology), Sabato Marco Siniscalchi(University of Enna Kore), Yannan Wang(Tencent Technology (Shenzhen) Co., Ltd) and Chin-Hui Lee(Georgia Institute of Technology)

Mon-3-4-7 An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances

Hu Hu(Georgia Institute of Technology), Sabato Marco Siniscalchi(University of Enna Kore), Yannan Wang(Tencent Technology (Shenzhen) Co., Ltd), Bai Xue(Institute of Software Chinese Academy of Sciences), Jun Du(University of Science and Technologoy of China) and Chin-Hui Lee(Georgia Institute of Technology)

Mon-3-4-8 Attention-Driven Projections for Soundscape Classification

Dhanunjaya Varma Devalraju(Indian Institute of Technology, Mandi), Muralikrishna H(Indian Institute of Technology Mandi), Padmanabhan Rajan(Indian Institute of Technology Mandi) and Dileep Aroor Dinesh(Indian Institute of Technology Mandi)

Mon-3-4-9 Computer Audition for Continuous Rainforest Occupancy Monitoring: The Case of Bornean Gibbons' Call Detection

Panagiotis Tzirakis(Imperial College London), Alexander Shiarella(Imperial College London), Robert Ewers(Imperial College London) and Björn Schuller(University of Augsburg / Imperial College London)

Mon-3-4-10 Deep Learning Based Open Set Acoustic Scene Classification

Zuzanna Kwiatkowska(Samsung R&D Institute Poland), Beniamin Kalinowski(Samsung R&D Institute Poland), Michał Kośmider(Samsung R&D Institute Poland) and Krzysztof Rykaczewski(Samsung R&D Institute Poland)

Singing Voice Computing and Processing in Music   Video

Mon-3-5-1 SINGING SYNTHESIS: WITH A LITTLE HELP FROM MY ATTENTION.

Orazio Angelini(Amazon Research Cambridge), Alexis Moinet(Amazon), Kayoko Yanagisawa(Amazon) and Thomas Drugman(Amazon)

Mon-3-5-2 Peking Opera Synthesis via Duration Informed Attention Network

Yusong Wu(Beijing University of Posts and Telecommunications), Shengchen Li(Beijing University of Posts and Telecommunications), Chengzhu Yu(Tencent), Heng Lu(Tencent American), Chao Weng(Tencent AI Lab), liqiang zhang(Beijing Institute of Technology) and Dong Yu(Tencent)

Mon-3-5-3 DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System

liqiang zhang(Beijing Institute of Technology), Chengzhu Yu(Tencent), Heng Lu(Tencent American), Chao Weng(Tencent), Chunlei Zhang(Tencent AI Lab), Yusong Wu(Beijing University of Posts and Telecommunications), Xiang Xie(Beijing Institute of Technology), Zijin Li(China Conservatory of Music) and Dong Yu(Tencent AI Lab)

Mon-3-5-4 Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music

Yuanbo Hou(Beijing University of Posts and Telecommunications), Frank K. Soong(Microsoft Research Asia), Jian Luan(Microsoft Search Technology Center Asia) and Shengchen Li(Beijing University of Posts and Telecommunications)

Mon-3-5-5 Channel-wise Subband Input for Better Voice and Accompaniment Separation on High Resolution Music

Haohe Liu(Northwestern Polytechnical University), lei xie(School of Computer Science, Northwestern Polytechnical University), Jian Wu(Northwestern Polytechnical University) and Geng Yang(School of Computer Science, Northwestern Polytechnical University)

Acoustic Model Adaptation for ASR   Video

Mon-3-7-1 Continual Learning in Automatic Speech Recognition

Samik Sadhu(Johns Hopkins University) and Hynek Hermansky(JHU)

Mon-3-7-2 Speaker Adaptive Training for Speech Recognition Based on Attention-over-Attention Mechanism

Genshun Wan(University of Science and Technology of China), Jia Pan(University of Science and Technology of China), Qingran Wang(iFlytek Research, iFlytek Co., Ltd.), Jianqing Gao(iFlytek Research, iFlytek Co., Ltd.) and Zhongfu Ye(University of Science and Technology of China)

Mon-3-7-3 Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator

Yan Huang(Microsoft Corporation), Jinyu Li(Microsoft), Lei He(Microsoft), Wenning Wei(Microsoft), William Gale(Microsoft) and Yifan Gong(Microsoft Corp)

Mon-3-7-4 Speech Transformer with Speaker Aware Persistent Memory

Yingzhu Zhao(Nanyang Technological University), Chongjia Ni(I2R), Cheung-Chi LEUNG(Alibaba Group), Shafiq Joty(Nanyang Technological University; Salesforce AI Research), Eng Siong Chng(Nanyang Technological University) and Bin Ma(Alibaba Inc.)

Mon-3-7-5 Adaptive Speaker Normalization for CTC-Based Speech Recognition

Fenglin Ding(University of Science and Technology of China), Wu Guo(university of science and technology of china), Bin Gu(University of Science and Technology of China), Zhenhua Ling(University of Science and Technology of China) and Jun Du(University of Science and Technologoy of China)

Mon-3-7-6 Unsupervised Domain Adaptation Under Label Space Mismatch for Speech Classification

Akhil Mathur(University College London), Nadia Berthouze(University College London) and Nicholas D. Lane(University of Cambridge)

Mon-3-7-7 Learning Fast Adaptation on Cross-Accented Speech Recognition

Genta Indra Winata(The Hong Kong University of Science and Technology), Samuel Cahyawijaya(HKUST), Zihan Liu(Hong Kong University of Science and Technology), Zhaojiang Lin(The Hong Kong University of Science and Technology), Andrea Madotto(The Hong Kong University Of Science and Technology), Peng Xu(The Hong Kong University of Science and Technology) and Pascale Fung(Hong Kong University of Science and Technology)

Mon-3-7-8 Black-box Adaptation of ASR for Accented Speech

Kartik Khandelwal(Indian Institute of Technology, Bombay), Preethi Jyothi(Indian Institute of Technology Bombay), Abhijeet Awasthi(IIT Bombay) and Sunita Sarawagi(IIT Bombay)

Mon-3-7-9 Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation

Mehmet Ali Tugtekin Turan(INRIA), Emmanuel Vincent(Inria) and Denis Jouvet(LORIA - INRIA)

Singing and Multimodal Synthesis   Video

Mon-3-8-1 Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

Jie Wu(Xiaoice, Software Technology Center Asia, Microsoft) and Jian Luan(Microsoft)

Mon-3-8-2 PREDICTION OF HEAD MOTION FROM SPEECH WAVEFORMS WITH A CANONICAL-CORRELATION-CONSTRAINED AUTOENCODER

Jinhong Lu(University of Edinburgh) and Hiroshi Shimodaira(University of Edinburgh)

Mon-3-8-3 XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

Peiling Lu(Microsoft), Jie Wu(Microsoft), Jian Luan(Microsoft), Xu Tan(Microsoft) and Li Zhou(Microsoft)

Mon-3-8-4 Stochastic Talking Face Generation Using Latent Distribution Matching

Ravindra Yadav(Indian Institute of Technology Kanpur), Ashish Sardana(NVIDIA), Vinay Namboodiri(IIT Kanpur) and Rajesh Hegde(Indian Institute of Technology Kanpur)

Mon-3-8-5 Speech-to-singing Conversion based on Boundary Equilibrium GAN

Da-Yi Wu(National Taiwan University) and Yi-Hsuan Yang(Academia Sinica)

Mon-3-8-6 Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image

Shunsuke Goto(The University of Tokyo), Kotaro Onishi(The University of Electro-Communications), Yuki Saito(The University of Tokyo), Kentaro Tachibana(DeNA Co., Ltd) and Koichiro Mori(DeNA Co., Ltd.)

Mon-3-8-7 Speech Driven Talking Head Generation via Attentional Landmarks Based Representation

wang wentao(Anhui University), Wang Yan(Anhui University), Li Teng(Anhui University), Sun Jianqing(Unisound), Liu Qiongsong(Unisound) and Liang Jiaen(Unisound)

Intelligibility-Enhancing Speech Modification   Video

Mon-3-9-2 iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

Haoyu Li(National Institute of Informatics), Szu-wei Fu(Research Center for Information Technology Innovation, Academia Sinica), Yu Tsao(Academia Sinica) and Junichi Yamagishi(National Institute of Informatics)

Mon-3-9-3 Intelligibility-enhancing speech modifications – The Hurricane Challenge 2.0

Jan Rennies(Fraunhofer IDMT, Hearing, Speech and Audio Technology), Henning Schepker(University of Oldenburg, Signal Processing Group, Oldenburg), Cassia Valentini-Botinhao(The Centre for Speech Technology Research, University of Edinburgh) and Martin Cooke(Basque Foundation for Science, Bilbao)

Mon-3-9-4 Exploring listeners' speech rate preferences

Olympia Simantiraki(Language and Speech Laboratory, Universidad del Pais Vasco) and Martin Cooke(Ikerbasque)

Mon-3-9-5 Adaptive compressive onset-enhancement for improved speech intelligibility in noise and reverberation

Felicitas Bederna(Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, and Cluster of Excellence Hearing4all, Oldenburg), Henning Schepker(Signal Processing Group, Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg), Christian Rollwage(Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, and Cluster of Excellence Hearing4all, Oldenburg), Simon Doclo(Signal Processing Group, Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg), Arne Pusch(Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, and Cluster of Excellence Hearing4all, Oldenburg), Jörg Bitzer(Institute of Hearing Technology and Audiology (IHA), Jade-University of Applied Sciences Wilhelmshaven / Oldenburg / Elsfleth) and Jan Rennies(Fraunhofer IDMT, Hearing, Speech and Audio Technology)

Mon-3-9-6 A Sound Engineering Approach to Near End Listening Enhancement

Carol Chermaz(The Centre for Speech Technology Research, The University of Edinburgh) and Simon King(University of Edinburgh)

Mon-3-9-7 Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

Dipjyoti Paul(Computer Science Department, University of Crete, Greece), Muhammed Shifas PV(Speech Signal Processing Lab, University of Crete), Yannis Pantazis(Institute of Applied and Computational Mathematics, FORTH) and Yannis Stylianou(Univ of Crete)

Human Speech Production I   Video

Mon-3-10-3 Speaker conditioned acoustic-to-articulatory inversion using x-vectors

Aravind Illa(PhD Student, Indian Institute of Science, Bangalore) and Prasanta Ghosh(Assistant Professor, EE, IISc)

Mon-3-10-4 Coarticulation as synchronised sequential target approximation: An EMA study

Zirui Liu(University College London), Yi Xu(University College London) and Feng-fan Hsieh(National Tsing Hua University)

Mon-3-10-5 Improved Model for Vocal Folds with a Polyp with Potential Application

Jônatas Santos(Federal Univesity of Sergipe), Jugurta Montalvão(Federal Univesity of Sergipe) and Israel Santos(Federal University of Sergipe)

Mon-3-10-6 Regional Resonance of the Lower Vocal Tract and its Contribution to Speaker Characteristics

Lin Zhang(Tianjin University), Kiyoshi Honda(Tianjin University), Jianguo Wei(Tianjin University) and Seiji Adachi(Fraunhofer Institute for Building Physics)

Mon-3-10-7 Air-tissue boundary segmentation in real time Magnetic Resonance Imaging video using 3-D convolutional neural network

Renuka Mannem(Indian Institute of Science), Navaneetha Gaddam(Rajiv Gandhi University of Knowledge Technologies, Kadapa) and Prasanta Ghosh(Assistant Professor, EE, IISc)

Mon-3-10-8 An investigation of the virtual lip trajectories during the production of bilabial stops and nasal at different speaking rates

Tilak Purohit(International Institute of Information Technology - Bangalore (IIIT-B), Bangalore, India) and Prasanta Ghosh(Assistant Professor, EE, IISc)

Targeted Source Separation  

Mon-3-11-1 SpEx+: A Complete Time Domain Speaker Extraction Network

Meng Ge(Tianjin University), Chenglin Xu(Nanyang Technological University), Longbiao Wang(Tianjin University), Eng Siong Chng(Nanyang Technological University), Jianwu Dang(JAIST) and Haizhou Li(National University of Singapore)

Mon-3-11-2 Atss-Net: Target Speaker Separation via Attention-based Neural Network

Tingle Li(Duke Kunshan University), Qingjian Lin(SEIT, Sun Yat-sen University), Yuanyuan Bao(Duke Kunshan University) and Ming Li(Duke Kunshan University)

Mon-3-11-3 Multimodal Target Speech Separation with Voice and Face References

Leyuan Qu(University of Hamburg), Cornelius Weber(University of Hamburg) and Stefan Wermter(University of Hamburg)

Mon-3-11-4 X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network

zining zhang(National University of Singapore), Bingsheng He(National University of Singapore) and zhenjie zhang(Yitu)

Mon-3-11-5 Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation

Chenda Li(Shanghai Jiao Tong University) and Yanmin Qian(Shanghai Jiao Tong University)

Mon-3-11-6 A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments

Yunzhe Hao(Institute of Automation, Chinese Academy of Sciences), jiaming xu(Institute of Automation, Chinese Academy of Sciences), Jing Shi(Institute of Automation, Chinese Academy of Sciences.), Peng Zhang(Institute of Automation, Chinese Academy of Science), Lei Qin(Huawei Consumer Business Group) and Bo Xu(Institute of Automation, Chinese Academy of Science)

Mon-3-11-7 Time-Domain Target-Speaker Speech Separation With Waveform-Based Speaker Embedding

jianshu zhao(Tokyo institute of Technology), Shengzhou Gao(Tokyo Institute of Technology) and Takahiro Shinozaki(Tokyo Institute of Technology)

Mon-3-11-8 Listen to What You Want: Neural Network-based Universal Sound Selector

Tsubasa Ochiai(NTT Communication Science Laboratories), Marc Delcroix(NTT Communication Science Laboratories), Yuma Koizumi(NTT Media Intelligence Laboratories), Hiroaki Itou(NTT Media Intelligence Laboratories), Keisuke Kinoshita(NTT) and Shoko Araki(NTT Communication Science Laboratories)

Mon-3-11-9 Crossmodal Sound Retrieval based on Specific Target Co-occurrence Denoted with Weak Labels

Masahiro Yasuda(NTT Corporation), Yasunori Ohishi(NTT corporation), Yuma Koizumi(NTT Media Intelligence Laboratories) and Noboru Harada(NTT Corporation)

Mon-3-11-10 Speaker-Aware Monaural Speech Separation

Jiahao Xu(The University of Sydney), Kun Hu(The University of Sydney), chang xu(The University of Sydney), Duc Chung Tran(Computing Fundamental Department, FPT University) and zhiyong wang(The University of Sydney)

Monday 21:00-22:00(GMT+8), October 26

Diversity Meeting  

Monday 21:45-22:45(GMT+8), October 26

Mentoring  

The event gives PhD students the opportunity to engage in a discussion with early-career and senior researchers from academia and industry. ISCA-SAC aims at providing a warm environment for discussing questions concerning a variety of topics. If you want to attend this event, please apply via Application Form. We will select participants according to availability of mentors and topics on a first-come first-serve basis. You will be informed via e-mail!

Tuesday 18:00-19:00(GMT+8), October 27

Keynote 2  

Brain networks enabling speech perception in everyday settings

Barbara Shinn-Cunningham (Carnegie Mellon University)

Tuesday 19:15-20:15(GMT+8), October 27

Speech Translation and Multilingual/Multimodal Learning   Video

Tue-1-1-1 A DNN-HMM-DNN Hybrid Model for Discovering Word-like Units from Spoken Captions and Image Regions

Liming Wang(University of Illinois, Urbana Champaign) and Mark Hasegawa-Johnson(University of Illinois)

Tue-1-1-2 Efficient Wait-k Models for Simultaneous Machine Translation

Maha Elbayad(INRIA / LIG), Laurent Besacier(LIG) and Jakob Verbeek(INRIA)

Tue-1-1-3 Investigating Self-supervised Pre-training for End-to-end Speech Translation

Ha Nguyen(LIG - Grenoble Alpes University, LIA - Avignon University), Fethi Bougares(LIUM- Le Mans Université), Natalia Tomashenko(LIA, University of Avignon), Yannick Estève(LIA - Avignon University) and Laurent Besacier(LIG)

Tue-1-1-4 Contextualized Translation of Automatically Segmented Speech

Marco Gaido(Fondazione Bruno Kessler, University of Trento), Mattia A. Di Gangi(Fondazione Bruno Kessler, University of Trento), Matteo Negri(Fondazione Bruno Kessler), Mauro Cettolo(FBK) and Marco Turchi(Fondazione Bruno Kessler)

Tue-1-1-5 Self-Training for End-to-End Speech Translation

Juan Pino(Facebook), Qiantong Xu(Facebook AI Research), Xutai Ma(Johns Hopkins University), Mohammad Javad Dousti(Facebook) and Yun Tang(Facebook)

Tue-1-1-6 Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing

Marcello Federico(Amazon AI), Yogesh Virkar(Amazon), Robert Enyedi(Amazon) and Roberto Barra-Chicote(Amazon)

Tue-1-1-7 Pair Expansion for Learning Multilingual Semantic Embeddings using Disjoint Visually-grounded Speech Audio Datasets

Yasunori Ohishi(NTT Corporation), Akisato Kimura(NTT Corporation), Takahito Kawanishi(NTT Corporation), Kunio Kashino(NTT Corporation), David Harwath(Massachusetts Institute of Technology) and James Glass(Massachusetts Institute of Technology)

Tue-1-1-8 Self-Supervised Representations Improve End-to-End Speech Translation

Anne Wu(Facebook), Changhan Wang(Facebook AI Research), Juan Pino(Facebook) and Jiatao Gu(Facebook AI Research)

Speaker Recognition I   Video

Tue-1-2-1 Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms

Jee-weon Jung(University of Seoul), seung-bin kim(university of seoul), Hye-jin Shim(University of Seoul), Ju-ho Kim(University of Seoul) and Ha-Jin Yu(University of Seoul)

Tue-1-2-2 Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

Youngmoon Jung(KAIST), Seong Min Kye(KAIST), Yeunju Choi(KAIST), Myunghun Jung(KAIST) and Hoi Rin Kim(KAIST)

Tue-1-2-3 An Adaptive X-vector Model for Text-independent Speaker Verification

Bin Gu(University of Science and Technology of China), Wu Guo(university of science and technology of china), Jun Du(University of Science and Technologoy of China), Zhenhua Ling(University of Science and Technology of China) and Fenglin Ding(University of Science and Technology of China)

Tue-1-2-4 Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

Santi Prieto (VeriDas | das-Nano) , Alfonso Ortega (University of Zaragoza) , Ivan L ´ opez-Espejo ´ (Ivan L ´ opez-Espejo) , Eduardo Lleida (University of Zaragoza)

Tue-1-2-5 Sum-Product Networks for Robust Automatic Speaker Identification

Aaron Nicolson(Griffith University) and Kuldip K. Paliwal(Griffith University)

Tue-1-2-6 Segment Aggregation for short utterances speaker verification using raw waveforms

Seung-bin Kim(University of Seoul), Jee-weon Jung(University of Seoul), Hye-jin Shim(University of Seoul), Ju-ho Kim(University of Seoul) and Ha-Jin Yu(University of Seoul)

Tue-1-2-7 SIAMESE X VECTOR RECONSTRUCTION FOR DOMAIN ADAPTED SPEAKER RECOGNITION

Shai Rozenberg(IBM), Hagai Aronowitz(IBM Research - Haifa) and Ron Hoory(IBM Haifa Research Lab)

Tue-1-2-8 Speaker Re-identification with Speaker Dependent Speech Enhancement

Yanpei Shi(University of Sheffield), Qiang Huang(University of Sheffield) and Thomas Hain(University of Sheffield)

Tue-1-2-9 Blind speech signal quality estimation for speaker verification systems

Galina Lavrentyeva(ITMO University, STC-innovations), Marina Volkova(ITMO University, STC-innovations Ltd.), Anastasia Avdeeva(STC-innovations Ltd.), Sergey Novoselov(ITMO University, Speech Technology Center), Artem Gorlanov(STC-innovations Ltd.), Tseren Andzukaev(STC-innovations Ltd.), Artem Ivanov(STC-innovations Ltd.) and Alexandr Kozlov(Speech Technology Center Ltd.)

Tue-1-2-10 Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

Xu Li(The Chinese University of Hong Kong), Na Li(Tencent), Jinghua Zhong(The Chinese University of Hong Kong), Xixin Wu(University of Cambridge), Xunying Liu(Chinese University of Hong Kong), Dan Su(Tencent AILab Shenzhen), Dong Yu(Tencent AI Lab) and Helen Meng(The Chinese University of Hong Kong)

Spoken Language Understanding II   Video

Tue-1-3-1 Modeling ASR Ambiguity for Neural Dialogue State Tracking

Vaishali Pal(IIIT Hyderabad), Fabien Guillot(Naver Labs Europe), Manish Shrivastava(IIIT Hyderabad), Jean-Michel Renders(Naver Labs Europe) and Laurent Besacier(LIG)

Tue-1-3-2 ASR Error Correction with Augmented Transformer for Entity Retrieval

Haoyu Wang(Amazon), Shuyan Dong(Amazon), Yue Liu(Amazon), James Logan(Amazon), Ashish Kumar Agrawal(Amazon) and Yang Liu(Amazon)

Tue-1-3-3 Large-Scale Transfer Learning for Low-resource Spoken Language Understanding

Xueli Jia(Ping An Technology (Shenzhen) Co., Ltd.), Jianzong Wang(Ping An Technology (Shenzhen) Co., Ltd.), Zhiyong Zhang(PingAn Tech.), Ning Cheng(Ping An Technology (Shenzhen) Co., Ltd.) and Jing Xiao(Ping An Technology)

Tue-1-3-4 Data balancing for boosting performance of low-frequency classes in Spoken Language Understanding

Judith Gaspers(Amazon), Quynh Do(Amazon AI) and Fabian Triefenbach(Amazon)

Tue-1-3-5 An Interactive Adversarial Reward Learning-based Spoken Language Understanding System

Yu Wang(Samsung Research America), yilin shen(Samsung Research America) and Hongxia Jin(Samsung Research America)

Tue-1-3-6 Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding

jin cao(Amazon), Jun Wang(Amazon), Wael Hamza(Amazon), Kelly Vanee(Amazon) and Shang-Wen Li(Amazon AWS AI)

Tue-1-3-7 Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training

Shota Orihashi(NTT Corporation), Mana Ihori(NTT Corporation), Tomohiro Tanaka(NTT Corporation) and Ryo Masumura(NTT Corporation)

Tue-1-3-8 Deep F-measure Maximization for End-to-End Speech Understanding

Leda Sari(University of Illinois at Urbana-Champaign) and Mark Hasegawa-Johnson(University of Illinois)

Tue-1-3-9 An Effective Domain Adaptive Post-Training Method for BERT in Response Selection

Taesun Whang(Korea University), Dongyub Lee(Kakao Corp), Chanhee Lee(Korea University), Kisu Yang(Korea University), Dongsuk Oh(Department of Computer Science and Engineering, Korea University) and Heuiseok Lim(Korea University)

Tue-1-3-10 Confidence measure for speech-to-concept end-to-end spoken language understanding

Antoine Caubrière(LIUM, University of Le Mans), Yannick Estève(LIA - Avignon University), Antoine LAURENT(LIUM - Laboratoire Informatique Université du Mans) and Emmanuel Morin(LS2N UMR CNRS 6004)

Human Speech Processing   Video

Tue-1-4-1 Attention to indexical information improves voice recall

Grant McGuire(University of California Santa Cruz) and Molly Babel(University of British Columbia)

Tue-1-4-2 Categorization of Whistled Consonants by French Speakers

Anaïs Tran Ngoc(Université Côte d'Azur, CNRS, BCL, France), Julien Meyer(Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble 38000, France) and Fanny Meunier(CNRS)

Tue-1-4-3 Whistled vowel identification by French listeners

Anaïs Tran Ngoc(Université Côte d'Azur, CNRS, BCL, France), Julien Meyer(Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble 38000, France) and Fanny Meunier(CNRS)

Tue-1-4-4 F0 slope as a cue to speech segmentation in French

Maria del Mar Cordero(Université Côte d’Azur, CNRS, BCL), Fanny Meunier(CNRS), Nicolas Grimault(CNRS, UMR 5292, INSERM, U1028, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics Team, Lyon), Stéphane Pota(4Université Grenoble Alpes, CNRS, LPNC, Grenoble) and Elsa Spinelli(4Université Grenoble Alpes, CNRS, LPNC, Grenoble)

Tue-1-4-5 Does French listeners’ ability to use accentual information at the word level depend on the ear of presentation?

Amandine Michelas(Aix-Marseille université, CNRS, LPL, UMR 7309, Aix-en-Provence) and Dufour Sophie(Aix-Marseille université, CNRS, LPL, UMR 7309, Aix-en-Provence)

Tue-1-4-7 Mandarin and English Adults’ Cue-weighting of Lexical Stress

Zhen zeng(MARCS Institute, Western Sydney University), Karen Mattock(School of Psychology), Liquan Liu(Western Sydney University and University of Oslo), Varghese Peter(School of Psychology, Western Sydney University), Alba Tuninetti(Bilkent University, Turkey) and Feng-Ming Tsao(National Taiwan University)

Tue-1-4-8 Age-related differences of tone perception in Mandarin-speaking seniors

Yan FENG(Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR), Gang PENG(Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR) and William Shi-Yuan WANG(Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong SAR)

Tue-1-4-9 Social and functional pressures in vocal alignment: Differences for human and voice-AI interlocutors

Georgia Zellou(UC Davis) and Michelle Cohn(University of California, Davis)

Tue-1-4-10 Identifying Important Time-frequency Locations in Continuous Speech Utterances

Hassan Salami Kavaki(The Graduate Center, CUNY, New York) and Michael Mandel(Brooklyn College, CUNY, New York)

Feature Extraction and Distant ASR   Video

Tue-1-5-1 Raw Sign and Magnitude Spectra for Multi-head Acoustic Modelling

Erfan Loweimi(The University of Edinburgh), Peter Bell(University of Edinburgh) and Steve Renals(University of Edinburgh)

Tue-1-5-2 Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

Purvi Agrawal(PhD Student, Indian Institute of Science, Bangalore-560012, India) and Sriram Ganapathy(Indian Institute of Science, Bangalore, India, 560012)

Tue-1-5-3 A Deep 2D Convolutional Network for Waveform-based Speech Recognition

Dino Oglic(King's College London), Zoran Cvetkovic(King's College London), Peter Bell(University of Edinburgh) and Steve Renals(University of Edinburgh)

Tue-1-5-4 Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

Ludwig Kürzinger(Technical University Munich), Nicolas Lindae(Technical University Munich), Palle Klewitz(Technical University Munich) and Gerhard Rigoll(Technical University Munich)

Tue-1-5-5 An alternative to MFCCs for ASR

Pegah Ghahremani(Johns Hopkins University), Hossein Hadian(Department of Computer Engineering, Sharif University of Technology, Tehran, Iran), Sanjeev Khudanpur(Johns Hopkins University), Hynek Hermansky(JHU) and Dan Povey(Johns Hopkins University)

Tue-1-5-6 Phase based spectro-temporal features for building a robust ASR system

anirban dutta(National Institute of Technology Meghalaya), Gudmalwar Ashishkumar(National Institute of Technology Meghalaya) and Ch. V. Rama Rao(National Institute of Technology, Meghalaya)

Tue-1-5-7 Deep Scattering Power Spectrum Features for Robust Speech Recognition

Neethu Mariam Joy(King's College London), Dino Oglic(King's College London), Zoran Cvetkovic(King's College London), Peter Bell(University of Edinburgh) and Steve Renals(University of Edinburgh)

Tue-1-5-8 FusionRNN: Shared Neural Parameters for Multi-Channel Distant Speech Recognition

Titouan parcollet(University of Oxford), Xinchi Qiu(University of Oxford) and Nicholas Lane(University of Oxford)

Tue-1-5-9 Bandpass Noise Generation and Augmentation for Unified ASR

Kshitiz Kumar(Microsoft Corporation), Bo Ren(Microsoft China), Yifan Gong(Microsoft Corp) and Jian Wu(Microsoft Corp)

Tue-1-5-10 Deep Learning Based Dereverberation of Temporal Envelopes for Robust Speech Recognition

Anurenjan Purushothaman(IISc), Anirudh Sreeram(IISc), Rohit Kumar(IISc Bangalore) and Sriram Ganapathy(Indian Institute of Science, Bangalore, India, 560012)

Voice Privacy Challenge   Video

Tue-SS-1-6-1 Introducing the VoicePrivacy Initiative

Natalia Tomashenko(LIA, University of Avignon), Brij Mohan Lal Srivastava(Inria), Xin Wang(National Institute of Informatics), Emmanuel Vincent(Inria), Andreas Nautsch(EURECOM), Junichi Yamagishi(National Institute of Informatics), Nicholas Evans(EURECOM), Jose Patino(EURECOM), Jean-Francois Bonastre(Avignon University, LIA), Paul-Gauthier Noé(Avignon Université) and Massimiliano Todisco(EURECOM - School of Engineering & Research Center - Digital Security Department)

Tue-SS-1-6-2 The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment

Andreas Nautsch(EURECOM), Jose Patino(EURECOM), Natalia Tomashenko(LIA, University of Avignon), Junichi Yamagishi(National Institute of Informatics), Pau-Gauthier Noé(LIA, University of Avignon), Jean-Francois Bonastre(Avignon University, LIA), Massimiliano Todisco(EURECOM - School of Engineering & Research Center - Digital Security Department) and Nicholas Evans(EURECOM)

Tue-SS-1-6-3 X-Vector Singular Value Modification and Statistical-Based Decomposition with Ensemble Regression Modeling for Speaker Anonymization System

Candy Olivia Mawalim(Japan Advanced Institute of Science and Technology), Kasorn Galajit(Japan Advanced Institute of Science and Technology, NECTEC), Jessada Karnjana(NECTEC, National Science and Technology Development Agency) and Masashi Unoki(JAIST)

Tue-SS-1-6-4 A Comparative Study of Speech Anonymization Metrics

Mohamed Maouche(Inria), Brij Mohan Lal Srivastava(Inria), Nathalie Vauquier(Inria), Aurélien Bellet(INRIA), Marc Tommasi(Université de Lille) and Emmanuel Vincent(Inria)

Tue-SS-1-6-5 Design Choices for X-vector Based Speaker Anonymization

Brij Mohan Lal Srivastava(Inria), Natalia Tomashenko(LIA, University of Avignon), Xin Wang(National Institute of Informatics), Emmanuel Vincent(Inria), Junichi Yamagishi(National Institute of Informatics), Mohamed Maouche(Inria), Aurélien Bellet(INRIA) and Marc Tommasi(Lille University)

Tue-SS-1-6-6 Speech Pseudonymisation Assessment Using Voice Similarity Matrices

Paul-Gauthier Noé(Avignon Université), Jean-Francois Bonastre(Avignon University, LIA), Driss Matrouf(Avignon Université), Natalia Tomashenko(LIA, University of Avignon), Andreas Nautsch(EURECOM) and Nicholas Evans(EURECOM)

Speech Synthesis: Text Processing, Data and Evaluation   Video

Tue-1-7-2 A Mask-based Model for Mandarin Chinese Polyphone Disambiguation

Haiteng Zhang (Databaker (Beijing) Technology Co., Ltd), Huashan Pan (Databaker (Beijing) Technology Co., Ltd), Xiulin Li (Databaker (Beijing) Technology Co., Ltd)

Tue-1-7-4 Enhancing Sequence-to-Sequence Text-to-Speech with Morphology

Jason Taylor(University of Edinburgh) and Korin Richmond(University of Edinburgh)

Tue-1-7-5 Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling

Yeunju Choi(KAIST), Youngmoon Jung(KAIST) and Hoi Rin Kim(KAIST)

Tue-1-7-6 Deep Learning Based Assessment of Synthetic Speech Naturalness

Gabriel Mittag(Technische Universität Berlin) and Sebastian Möller(Quality and Usability Lab, TU Berlin)

Tue-1-7-7 Distant Supervision for Polyphone Disambiguation in Mandarin Chinese

Jiawen Zhang(University of Chinese Academy of Sciences), Yuanyuan Zhao(Kwai), Jiaqi Zhu(Institute of Software, Chinese Academy of Science) and Jinba Xiao(Kwai)

Tue-1-7-8 An unsupervised method to select a speaker subset from large multi-speaker speech synthesis datasets

Pilar Oplustil(University of Edinburgh), Jennifer Williams(University of Edinburgh), Joanna Rownicka(The University of Edinburgh) and Simon King(University of Edinburgh)

Tue-1-7-9 Understanding the Effect of Voice Quality and Accent on Talker Similarity

Anurag Das(Texas A&M University), Guanlong Zhao(Texas A&M University), Evgeny Chukharev-Hudilainen(Iowa State University), John Levis(Iowa State University) and Ricardo Gutierrez-Osuna(Texas A&M University)

Search for Speech Recognition   Video

Tue-1-8-1 Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition without Length Bias

Wei Zhou(RWTH Aachen University), Ralf Schlüter(Lehrstuhl Informatik 6, RWTH Aachen University) and Hermann Ney(RWTH Aachen University)

Tue-1-8-2 Transformer with Bidirectional Decoder for Speech Recognition

Xi Chen(Tsinghua University), Songyang Zhang(ShanghaiTech University), Dandan Song(Tsingmicro Intelligent Technology Co. Limited), Peng Ouyang(Tsingmicro Intelligent Technology Co. Limited) and Shouyi YIN(Tsinghua University)

Tue-1-8-3 An investigation of phone-based subword units for end-to-end speech recognition

Weiran Wang(Salesforce Research), Guangsen Wang(Salesforce Rsearch), Aadyot Bhatnagar(Salesforce Research), Yingbo Zhou(Salesforce Research), Caiming Xiong(Salesforce) and Richard Socher(Salesforce Research)

Tue-1-8-4 Combination of end-to-end and hybrid models for speech recognition

Jeremy Heng Meng Wong(Microsoft), Yashesh Gaur(Microsoft), rui zhao(microsoft), Liang Lu(Microsoft), Eric Sun(Microsoft), Jinyu Li(Microsoft) and Yifan Gong(Microsoft Corp)

Tue-1-8-6 Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition

Abhinav Garg(Samsung Electronics), Ashutosh Gupta(Samsung Electronics), Dhananjaya Gowda(Samsung Research), Shatrughan Singh(Samsung Research) and Chanwoo Kim(Samsung Research)

Tue-1-8-7 LVCSR with Transformer Language Models

Eugen Beck (RWTH Aachen University), Ralf Schluter (RWTH Aachen University), Hermann Ney (RWTH Aachen University)

Tue-1-8-8 DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

Yi-Chen Chen(National Taiwan University), Jui-Yang Hsu(National Taiwan University), Cheng-Kuang Lee(NVIDIA) and Hung-yi Lee(National Taiwan University (NTU))

Computational Paralinguistics I (CP I)   Video

Tue-1-9-1 Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus

Lukas Stappen(University of Augsburg), Georgios Rizos(Imperial College London), Madina Hasan(The University of Sheffield), Thomas Hain(University of Sheffield) and Björn Schuller(University of Augsburg / Imperial College London)

Tue-1-9-2 Individual variation in language attitudes toward voice-AI: The role of listeners’ autistic-like traits

Michelle Cohn(University of California, Davis), Melina Sarian(UC Davis), Kristin Predeck(UC Davis) and Georgia Zellou(UC Davis)

Tue-1-9-3 Differences in Gradient Emotion Perception: Human vs. Alexa Voices

Michelle Cohn(University of California, Davis), Eran Raveh(Saarland University), Kristin Predeck(UC Davis), Iona Gessinger(Saarland University), Bernd Möbius(Saarland University) and Georgia Zellou(UC Davis)

Tue-1-9-4 The MSP-Conversation Corpus

Luz Martinez-Lucas(The University of Texas at Dallas), Mohammed Abdelwahab(University of Texas at Dallas) and Carlos Busso(The University of Texas at Dallas)

Tue-1-9-5 Spotting the Traces of Depression in Read Speech: An Approach Based on Computational Paralinguistics and Social Signal Processing

Fuxiang Tao(University of Glasgow), Anna Esposito(Universita' della Campania "Luigi Vanvitelli") and Alessandro Vinciarelli(University of Glasgow)

Tue-1-9-6 Speech Sentiment and Customer Satisfaction Estimation in Socialbot Conversations

Yelin Kim(Amazon Lab126), Joshua Levy(Amazon, Alexa Speech) and Yang Liu(Amazon, Alexa AI)

Tue-1-9-7 Pardon the Interruption: An Analysis of Gender and Turn-Taking in U.S. Supreme Court Oral Arguments

Haley Lepp(University of Washington) and Gina-Anne Levow(University of Washington)

Tue-1-9-9 An Objective Voice Gender Scoring System and Identification of the Salient Acoustic Measures

Fuling Chen(University of Western Australia), Roberto Togneri(University of Western Australia), Murray Maybery(University of Western Australia) and Diana Tan(University of Western Australia)

Tue-1-9-10 How Ordinal Are Your Data?

Sadari Jayawardena(The University of New South Wales, Sydney), Julien Epps(School of Electrical Engineering and Telecommunications, UNSW Australia) and Zhaocheng Huang(School of Electrical Engineering and Telecommunications, UNSW Australia)

Acoustic Phonetics and Prosody   Video

Tue-1-10-1 Correlating cepstra with formant frequencies: implications for phonetically-informed forensic voice comparison

Vincent Hughes(Department of Language and Linguistic Science, University of York), Frantz Clermont(School of Culture, History and Language, Australian National University) and Philip Harrison(Department of Language and Linguistic Science, University of York)

Tue-1-10-2 Prosody and breathing: A comparison between rhetorical and information-seeking questions in German and Brazilian Portuguese

Jana Neitsch(University of Southern Denmark), Plinio Barbosa(University of Campinas) and Oliver Niebuhr(University of Southern Denmark)

Tue-1-10-3 Scaling processes of clause chains in Pitjantjatjara

Rebecca Defina(University of Melbourne), Catalina Torres(University of Melbourne) and Hywel Stoakes(University of Melbourne)

Tue-1-10-4 Neutralization of voicing distinction of stops in Tohoku dialects of Japanese: a field work and acoustic measurements

Ai Mizoguchi(Maebashi Institute of Technology), Ayako Hashimoto(Tokyo Kasei-gakuin College), Sanae Matsui(Sophia University), Setsuko Imatomi(Mejiro University), Ryunosuke Kobayashi(Sophia University) and Mafuyu Kitahara(Sophia University)

Tue-1-10-5 Correlation between prosody and pragmatics: case study of discourse markers in French and English

Lou Lee(Université de Lorraine), Denis Jouvet(LORIA - INRIA), Katarina Bartkova(Atilf - Université de Lorraine), Yvon Keromnes(ATILF - Université de Lorraine) and Mathilde Dargnat(ATILF - Université de Lorraine)

Tue-1-10-6 An analysis of prosodic prominence cues to information structure in Egyptian Arabic

Dina El Zarka (University of Graz), Anneliese Kelterer1 (University of Graz), Barbara Schuppler (Graz University of Technology)

Tue-1-10-7 Lexical Stress in Urdu

Benazir Mumtaz(University of Konstanz), Tina Bögel(University of Konstanz) and Miriam Butt(University of Konstanz)

Tue-1-10-8 Vocal markers from sustained phonation in Huntington's Disease

Rachid Riad(LSCP/NPI/ENS/CNRS/EHESS/INRIA/UPEC/PSL Research University,), Hadrien Titeux(LSCP - EHESS / ENS / PSL Research University / CNRS / INRIA), Laurie Lemoine(NPI/ENS/INSERM/UPEC/PSL Research University), Justine Montillot(NPI/ENS/INSERM/UPEC/PSL Research University), Jennifer Hamet Bagnou(NPI/ENS/INSERM/UPEC/PSL Research University), Xuan-Nga Cao(LSCP - EHESS / ENS / PSL Research University / CNRS / INRIA), Emmanuel Dupoux(Ecole des Hautes Etudes en Sciences Sociales) and Anne-Catherine Bachoud-Lévi(NPI/ENS/INSERM/UPEC/PSL Research University)

Tue-1-10-9 How Rhythm and Timbre encode Mooré language in Bendré drummed speech

Laure Dentel(The World Whistles Research Association) and Julien Meyer(Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble 38000, France)

Tuesday 20:30-22:00(GMT+8), October 27

ISCA General Assembly  

Wednesday 18:00-19:00(GMT+8), October 28

Keynote 3  

Wednesday 19:15-20:15(GMT+8), October 28

Tonal Aspects of Acoustic Phonetics and Prosody   Video

Wed-1-1-1 Interaction of Tone and Voicing in Mizo

Wendy Lalhminghlui(IIT Guwahati) and Priyankoo Sarmah(Indian Institute of Technology Guwahati)

Wed-1-1-2 Mandarin lexical tones: a corpus-based study of word length, syllable position and prosodic structure on duration

Yaru WU(Laboratoire de Phonétique et Phonologie (UMR7018, CNRS-Sorbonne Nouvelle), France; Modèles, Dynamiques, Corpus (MoDyCo), UMR 7114, CNRS, France), Martine Adda-Decker(LPP (Lab. Phonétique & Phonologie) / LIMSI-CNRS) and Lori Lamel(CNRS/LIMSI)

Wed-1-1-3 An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech

Yingming Gao(Institute of Acoustics and Speech Communication, Technische Universität Dresden), Xinyu Zhang(Institute of Acoustics and Speech Communication, TU Dresden), Yi Xu(University College London), Jinsong Zhang(Beijing Language and Culture University) and Peter Birkholz(Institute of Acoustics and Speech Communication, TU Dresden)

Wed-1-1-4 Integrating the application and realization of Mandarin 3rd tone sandhi in the resolution of sentence ambiguity

Wei Lai(Department of Linguistics, University of Pennsylvania) and Aini Li(Department of Linguistics, University of Pennsylvania)

Wed-1-1-5 Neutral Tone in Changde Mandarin

Zhenrui Zhang(University of Chinese Academy of Social Sciences) and Fang Hu(Institute of Linguistics, Chinese Academy of Social Sciences)

Wed-1-1-6 Pitch Declination and Final Lowering in Northeastern Mandarin

Ping Cui(Peking University) and Jianjing Kuang(University of Pennsylvania)

Wed-1-1-7 Variation in Spectral Slope and Interharmonic Noise in Cantonese Tones

Phil Rose(Australian National University Emeritus Faculty)

Wed-1-1-8 The acoustic realization of Mandarin tones in fast speech

Ping Tang(Nanjing University of Science and Technology) and Shanpeng Li(Nanjing Normal University)

Speech Classification   Video

Wed-1-2-1 Do face masks introduce bias in speech technologies? The case of automated scoring of speaking proficiency.

Anastassia Loukina(Educational Testing Service), Keelan Evanini(Educational Testing Service), Matthew Mulholland(Educational Testing Service), Ian Blood(Educational Testing Service) and Klaus Zechner(ETS)

Wed-1-2-2 A low latency ASR-free end to end spoken language understanding system

Mohamed Mhiri(fluent.ai), Samuel Myer(fluent.ai) and Vikrant Singh Tomar(fluent.ai)

Wed-1-2-3 An Audio-Based Wakeword-Independent Verification System

Joe Wang(Amazon Alexa), Rajath Kumar(Columbia University), Mike Rodehorst(Amazon), Brian Kulis(Boston University and Amazon) and Shiv Vitaladevuni(Amazon)

Wed-1-2-4 Learnable Spectro-temporal Receptive Fields for Robust Voice Type Discrimination

Tyler Vuong(Carnegie Mellon University), Yangyang Xia(Carnegie Mellon University) and Richard Stern(Carnegie Mellon University)

Wed-1-2-5 Low Latency Speech Recognition using End-to-End Prefetching

Shuo-Yiin Chang(Google USA), Bo Li(Google), David Rybach(Google), Yanzhang He(Google Inc.), Wei Li(Google Inc), Tara Sainath(Google) and Trevor Strohman(Google)

Wed-1-2-6 AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

Jingsong Wang(4Paradigm Inc.), Tom Ko(South University of Science and Technology), Zhen Xu(4Paradigm Inc.), Shouxiang Liu(4Paradigm Inc.), Xiawei Guo(4Paradigm Inc.), Weiwei Tu(4Paradigm Inc.) and lei xie(School of Computer Science, Northwestern Polytechnical University)

Wed-1-2-7 Building a Robust Word-Level Wakeword Verification Network

Rajath Kumar(Amazon Alexa), Mike Rodehorst(Amazon), Joe Wang(Amazon), Jiacheng Gu(Amazon) and Brian Kulis(Boston University and Amazon)

Wed-1-2-8 A Transformer-based Audio Captioning Model with Keyword Estimation

Yuma Koizumi(NTT Media Intelligence Laboratories), Ryo Masumura(NTT Corporation), Kyosuke Nishida(NTT Media Intelligence Laboratories), Masahiro Yasuda(NTT media intelligence laboratories) and Shoichiro Saito(NTT Media Intelligence Laboratories)

Wed-1-2-9 Neural Architecture Search For Keyword Spotting

Tong Mo(University of Alberta), Yakun Yu(University of Alberta), Mohammad Salameh(Huawei Technologies), Di Niu(University of Alberta) and Shangling Jui(Huawei Technologies)

Wed-1-2-10 Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution

Ximin Li(CAS Key Laboratory of Wireless-Optical Communications, University of Science and Technology of China, Hefei), Xiaodong Wei(CAS Key Laboratory of Wireless-Optical Communications, University of Science and Technology of China, Hefei) and Xiaowei Qin(CAS Key Laboratory of Wireless-Optical Communications, University of Science and Technology of China, Hefei)

Speech Synthesis Paradigms and Methods I   Video

Wed-1-3-1 Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model

Xin Wang(National Institute of Informatics, Sokendai University) and Junichi Yamagishi(National Institute of Informatics)

Wed-1-3-2 Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization

Jen-Yu Liu(Taiwan AI Labs), Yu-Hua Chen(Taiwan AI Labs), Yin-Cheng Yeh(Taiwan AI Labs) and Yi-Hsuan Yang(Academia Sinica)

Wed-1-3-4 Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding

Seungwoo Choi(Hyperconnect), Seungju Han(Hyperconnect), Dongyoung Kim(Hyperconnect) and Sungjoo Ha(Hyperconnect)

Wed-1-3-5 Reformer-TTS: Neural Speech Synthesis with Reformer Network

Hyeongrae Ihm(Seoul National University), Joun Yeop Lee(Seoul National University), Byoung Jin Choi(Seoul National University), Sung Jun Cheon(Seoul National University) and Nam Soo Kim(Seoul National University)

Wed-1-3-6 CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion

Takuhiro Kaneko(NTT Communication Science Laboratories), Hirokazu Kameoka(NTT Communication Science Laboratories), Kou Tanaka(NTT corporation) and Nobukatsu Hojo(NTT)

Wed-1-3-7 High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Nikolaos Ellinas(Innoetics, Samsung Electronics), Georgios Vamvoukakis(Innoetics, Samsung Electronics), Konstantinos Markopoulos(Innoetics, Samsung Electronics), Aimilios Chalamandaris(Innoetics, Samsung Electronics), Georgia Maniati(Innoetics, Samsung Electronics), Panos Kakoulidis(Innoetics, Samsung Electronics), Spyros Raptis(Innoetics, Samsung Electronics), June Sig Sung(Mobile Communications Business, Samsung Electronics), Hyoungmin Park(Mobile Communications Business, Samsung Electronics) and Pirros Tsiakoulis(Innoetics, Samsung Electronics)

Wed-1-3-8 DurIAN: Duration Informed Attention Network For Speech Synthesis

Chengzhu Yu(Tencent), Heng Lu(Tencent American), Na Hu(Tencent), Meng Yu(Tencent), Chao Weng(Tencent AI Lab), Kun Xu(Tencent), Peng Liu(Tencent), Deyi Tuo(Tencent), Shiyin Kang(Tencent), Guangzhi Lei(Tencent), Dan Su(Tencent AILab Shenzhen) and Dong Yu(Tencent)

Wed-1-3-9 Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

Kentaro Mitsui(The University of Tokyo), Tomoki Koriyama(The University of Tokyo) and Hiroshi Saruwatari(The University of Tokyo)

Wed-1-3-10 A Hybrid HMM-Waveglow based Text-to-speech synthesizer using Histogram Equalization for low-resource Indian languages

Mano Ranjith Kumar(Indian Institute of Technology, Madras), Sudhanshu Srivastava(Indian Institute of Technology, Madras), Anusha Prakash(Indian Institute of Technology Madras) and Hema Murthy(IIT Madras)

The INTERSPEECH 2020 Computational Paralinguistics ChallengE (ComParE)   Video

Wed-SS-1-4-1 The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks

Björn Schuller(University of Augsburg / Imperial College London), Anton Batliner(University of Augsburg), Christian Bergler(Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab), Eva-Maria Messner(University of Ulm), Antonia Hamilton(UCL), Shahin Amiriparian(University of Augsburg / Technische Universität München), Alice Baird(University of Augsburg), Georgios Rizos(Imperial College London), Maximilian Schmitt(Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg), Lukas Stappen(University of Augsburg), Harald Baumeister(University of Ulm), Alexis Deighton MacIntyre(UCL) and Simone Hantke(audEERING)

Wed-SS-1-4-2 Learning Higher Representations from pre-trained Deep Models with Data Augmentation for the ComParE 2020 Challenge Mask Task

Tomoya Koike(The University of Tokyo), Kun Qian(The University of Tokyo), Björn Schuller(University of Augsburg / Imperial College London) and Yoshiharu Yamamoto(The University of Tokyo)

Wed-SS-1-4-3 Surgical Mask Detection with Convolutional Neural Networks and Data Augmentations on Spectrograms

Steffen Illium(LMU Munich), Robert Müller(LMU Munich), Andreas Sedlmeier(LMU Munich) and Claudia Linnhoff-Popien(LMU Munich)

Wed-SS-1-4-4 Surgical mask detection with deep recurrent phonetic models

Philipp Klumpp(Friedrich-Alexander-Universität Erlangen-Nürnberg), Tomas Arias-Vergara(Ludwig-Maximilians University), Juan Camilo Vásquez Correa(Pattern Recognition Lab, Friedrich Alexander University), Paula Andrea Pérez Toro(Universidad de Antioquia), Florian Hönig(Pattern Recognition Lab, Friedrich-Alexander University of Erlangen-Nuremberg, Germany), Elmar Noeth(Friedrich-Alexander-University Erlangen-Nuremberg) and Juan Rafael Orozco-Arroyave(Universidad de Antioquia)

Wed-SS-1-4-5 Phonetic, Frame Clustering and Intelligibility Analyses for the INTERSPEECH 2020 ComParE Challenge

Claude Montacié(Sorbonne University (STIH)) and Marie-José Caraty(Paris University (STIH))

Wed-SS-1-4-6 Exploring Text and Audio Embeddings for Multi-Dimension Elderly Emotion Recognition

Mariana Julião(INESC-ID/IST), Alberto Abad(INESC-ID/IST) and Helena Moniz(INESC-ID, University of Lisbon)

Wed-SS-1-4-7 Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges

Maxim Markitantov(St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences), Denis Dresvyanskiy(Ulm University), Danila Mamontov(Ulm University), Heysem Kaya(Department of Information and Computing Sciences, Utrecht University), Wolfgang Minker(Ulm University) and Alexey Karpov(St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences)

Wed-SS-1-4-8 Analyzing Breath Signals for the Interspeech 2020 ComParE Challenge

John Mendonca(INESC-ID/Instituto Superior Técnico), Francisco Teixeira(INESC-ID/Instituto Superior Técnico, Universidade de Lisboa), Isabel Trancoso(INESC-ID / IST Univ. Lisbon) and Alberto Abad(INESC-ID/IST)

Wed-SS-1-4-9 Deep Attentive End-to-End Continuous Breath Sensing from Speech

Alexis MacIntyre(University College London), Georgios Rizos(Imperial College London), Anton Batliner(University of Augsburg), Alice Baird(University of Augsburg), Shahin Amiriparian(University of Augsburg), Antonia Hamilton(University College London) and Björn Schuller(University of Augsburg / Imperial College London)

Wed-SS-1-4-10 Paralinguistic Classification of Mask Wearing by Image Classifiers and Fusion

Jeno Szep(University of Arizona) and Salim Hariri(University of Arizona)

Wed-SS-1-4-11 Exploration of Acoustic and Lexical Cues for the INTERSPEECH 2020 Computational Paralinguistic Challenge

Ziqing Yang(New York Institute of Technology), Zifan An(New York Institute of Technology), Zehao Fan(New York Institute of Technology), Chengye Jing(New York Institute of Technology) and Houwei Cao(New York Institute of Technology)

Wed-SS-1-4-12 Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition

Gizem Soğancıoğlu(Department of Information and Computing Sciences, Utrecht University), Oxana Verkholyak(St. Petersburg Institute for Informatics and Automation of Russian Academy of Sciences), Heysem Kaya(Department of Information and Computing Sciences, Utrecht University), Dmitrii Fedotov(Ulm University), Tobias Cadée(Department of Information and Computing Sciences, Utrecht University), Albert Ali Salah(Department of Information and Computing Sciences, Utrecht University) and Alexey Karpov(St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences)

Wed-SS-1-4-13 Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs

Nicolae-Cat˘ alin Ristea(University Politehnica of Bucharest), Radu Tudor Ionescu (University of Bucharest)

Streaming ASR   Video

Wed-1-5-1 1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM

Kshitiz Kumar(Microsoft Corporation), Chaojun Liu(Microsoft), Yifan Gong(Microsoft Corp) and Jian Wu(Microsoft Corp.)

Wed-1-5-2 Low Latency End-to-End Streaming Speech Recognition with a Scout Network

Chengyi Wang(Nankai University), Yu Wu(Microsoft Research Asia), Liang Lu(Microsoft), Shujie Liu(Microsoft Research Asia, Beijing), Jinyu Li(Microsoft), Guoli Ye -(Microsoft) and Ming Zhou(microsoft research asia)

Wed-1-5-4 Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition

Wei Li(Google Inc), James Qin(google), Chung-Cheng Chiu(Google), Ruoming Pang(Google Inc.) and Yanzhang He(Google Inc.)

Wed-1-5-5 Improved hybrid streaming ASR with Transformer language models

Pau Baquero-Arnal(Machine Learning and Language Processing (MLLP) research group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València (Spain)), Javier Jorge(Machine Learning and Language Processing (MLLP) research group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València (Spain)), Adrià Giménez(Machine Learning and Language Processing (MLLP) research group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València (Spain)), Joan Albert Silvestre-Cerdà(Machine Learning and Language Processing (MLLP) research group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València (Spain)), Javier Iranzo-Sánchez(Machine Learning and Language Processing (MLLP) research group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València (Spain)), Albert Sanchis(Machine Learning and Language Processing (MLLP) research group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València (Spain)), Jorge Civera(Machine Learning and Language Processing (MLLP) research group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València (Spain)) and Alfons Juan(Machine Learning and Language Processing (MLLP) research group, Valencian Research Institute for Artificial Intelligence (VRAIN),

Wed-1-5-6 Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

Chunyang Wu(Facebook), Yongqiang Wang(Facebook), Yangyang Shi(Facebook), Ching-Feng Yeh(Facebook Inc.) and Frank Zhang(Facebook)

Wed-1-5-7 Enhancing Monotonic Multihead Attention for Streaming ASR

Hirofumi Inaguma(Kyoto University), Masato Mimura(Kyoto University) and Tatsuya Kawahara(Kyoto University)

Wed-1-5-8 Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

ShiLiang Zhang(Alibaba Group), Zhifu Gao(Machine Intelligence Technology, Alibaba Group), Haoneng Luo(School of Computer Science, Northwestern Polytechnical University), Ming Lei(Machine Intelligence Technology, Alibaba Group), Jie Gao(Machine Intelligence Technology, Alibaba Group), Zhijie Yan(Machine Intelligence Technology, Alibaba Group) and lei xie(School of Computer Science, Northwestern Polytechnical University)

Wed-1-5-9 High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Thai Son Nguyen(Karlsruhe Institute of Technology), Ngoc Quan Pham(Karlsruhe Institute of Technology), Sebastian Stüker(Karlsruhe Institute of Technology) and Alex Waibel(Karlsruhe Institute of Technology)

Wed-1-5-10 Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

Vikas Joshi(Microsoft Corporation), rui zhao(microsoft), Rupesh Mehta(Microsoft), Kshitiz Kumar(Microsoft Corporation) and Jinyu Li(Microsoft)

Alzheimers Dementia Recognition through Spontaneous Speech (ADReSS)   Video

Wed-SS-1-6-1 Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

Saturnino Luz(The University of Edinburgh), Fasih Haider(the University of Edinburgh), Sofia de la Fuente(University of Edinburgh), Davida Fromm(Department of Psychology, Carnegie Mellon University) and Brian MacWhinney(Carnegie Mellon University)

Wed-SS-1-6-2 Disfluencies and Fine-Tuning Pre-trained Language Models for Detection of Alzheimer’s Disease

Jiahong Yuan(Baidu Research USA), Yuchen Bian(Baidu Research USA), Xingyu Cai(Baidu Research USA), Jiaji Huang(Baidu Research USA), Zheng Ye(Chinese Academy of Sciences) and Kenneth Church(Baidu Research USA)

Wed-SS-1-6-3 To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer’s Disease Detection

Aparna Balagopalan(Winterlight Labs), Ben Eyre(Winterlight Labs), Frank Rudzicz(University of Toronto/Vector Institute) and Jekaterina Novikova(Winterlight Labs)

Wed-SS-1-6-4 Tackling the ADReSS challenge: a multimodal approach to the automated recognition of Alzheimer’s dementia

Matej Martinc(Jozef Stefan Institute) and Senja Pollak(Jožef Stefan Institute)

Wed-SS-1-6-5 Using state of the art speaker recognition and natural language processing technologies to detect Alzheimer’s disease and assess its severity

Raghavendra Pappagari(The Johns Hopkins University), Jaejin Cho(Johns Hopkins University), Laureano Moro Velazquez(Johns Hopkins University) and Najim Dehak(Johns Hopkins University)

Wed-SS-1-6-6 A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition

Nicholas Cummins(University of Augsburg), Yilin Pan(University of Sheffield), Zhao Ren(University of Augsburg), Julian Fritsch(Idiap Research Institute), Venkata Srikanth Nallanthighal(Philips Research, Eindhoven and Radboud University, Nijmegen), Heidi Christensen(University of Sheffield), Daniel Blackburn(University of Sheffield), Björn Schuller(University of Augsburg / Imperial College London), Mathew Magimai Doss(Idiap Research Institute), Helmer Strik(Centre for Language and Speech Technology (CLST), Centre for Language Studies (CLS), Radboud University Nijmegen) and Aki Harma(Philips Research)

Wed-SS-1-6-7 Multi-modal Fusion with Gating using Audio, Lexical and Disfluency Features for Alzheimer's Dementia Recognition from Spontaneous Speech

Morteza Rohanian(Queen Mary University of London), Julian Hough(Queen Mary University of London) and Matthew Purver(Queen Mary University of London)

Wed-SS-1-6-8 Comparing Natural Language Processing Techniques for Alzheimer's Dementia Prediction in Spontaneous Speech

Thomas Searle(King's College London), Zina Ibrahim(King's College London) and Richard Dobson(King's College London)

Wed-SS-1-6-9 Multiscale system for Alzheimer's Dementia Recognition through Spontaneous Speech

Erik Edwards(Verisk, Inc.), Charles Dognin(Verisk Analytics), Bajibabu Bollepalli(Aalto University) and Maneesh Singh(Verisk Analytics)

Wed-SS-1-6-10 The INESC-ID Multi-Modal System for the ADReSS 2020 Challenge

Anna Pompili(INESC-ID), Thomas Rolland(INESC-ID, Instituto Superior Técnico, Universidade de Lisboa) and Alberto Abad(INESC-ID, Instituto Superior Técnico, Universidade de Lisboa)

Wed-SS-1-6-11 Exploring MMSE Score Prediction Using Verbal and Non-Verbal Cues

Shahla Farzana(University of Illinois Chicago) and Natalie Parde(University of Illinois at Chicago)

Wed-SS-1-6-12 Multimodal Inductive Transfer Learning for Detection of Alzheimer's Dementia and its Severity

Utkarsh Sarawgi(Massachusetts Institute of Technology), Wazeer Zulfikar(Massachusetts Institute of Technology), Nouran Soliman(Massachusetts Institute of Technology) and Pattie Maes(Massachusetts Institute of Technology)

Wed-SS-1-6-13 Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer’s Dementia Recognition

Junghyun Koo(Music Audio Research Group, Seoul National University), Jie Hwan Lee(Music & Audio Research Group, Seoul National University), Jaewoo Pyo(Electrical and Computer Engineering, Seoul National University), Yujin Jo(College of Liberal Studies, Seoul National University) and Kyogu Lee(Seoul National University)

Wed-SS-1-6-14 Automated Screening for Alzheimer’s Dementia through Spontaneous Speech

Muhammad Shehram Shah Syed(RMIT University), Zafi Sherhan Syed(Mehran University), Margaret Lech(RMIT University) and Elena Pirogova(RMIT University)

Speaker Recognition Challenges and Applications   Video

Wed-1-7-1 NEC-TT Speaker Verification System for SRE'19 CTS Challenge

Kong Aik Lee(Biometrics Research Laboratories, NEC Corporation), Koji Okabe(NEC Corporation), Hitoshi Yamamoto(NEC Corporation), Qiongqiong Wang(Data Science Research Laboratories, NEC Corporation), Ling Guo(Biometrics Research Laboratories, NEC Corporation), Takafumi Koshinaka(Biometrics Research Labs., NEC Corporation), Jiacen Zhang(Tokyo Institute of Technology), Keisuke Ishikawa(Tokyo Institute of Technology) and Koichi Shinoda(Tokyo Institute of Technology)

Wed-1-7-2 THUEE System for NIST SRE19 CTS Challenge

Ruyun Li(Tsinghua University), Tianyu Liang(Tsinghua University), Dandan Song(TsingMicro Co. Ltd.), yi liu(tsinghua university), Yangcheng Wu(Tsinghua University), Can Xu(Tsinghua University), Peng Ouyang(TsingMicro Co. Ltd.), Xianwei Zhang(Tsinghua University), Shouyi Yin(Tsinghua University), Xianhong Chen(Tsinghua University), Weiqiang Zhang(Tsinghua University) and Liang HE(Tsinghua University)

Wed-1-7-3 Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe submission to NIST SRE Challenge 2019

Grigory Antipov(Orange), nicolas gengembre(Orange), Olivier Le Blouch(Orange) and Gaël Le Lan(Orange Labs)

Wed-1-7-4 Audio-visual Speaker Recognition with a Cross-modal Discriminative Network

Ruijie Tao(National University of Singapore), Rohan Kumar Das(National University Singapore) and Haizhou Li(National University of Singapore)

Wed-1-7-5 Multimodal Association for Speaker Verification

Suwon Shon(Massachusetts Institute of Technology) and James Glass(Massachusetts Institute of Technology)

Wed-1-7-6 Multi-modality Matters: A Performance Leap on VoxCeleb

Zhengyang Chen(MoE Key Lab of Artificial Intelligence SpeechLab, Department of Computer Science and EngineeringShanghai Jiao Tong University, Shanghai), Shuai Wang(Shanghai Jiao Tong University) and Yanmin Qian(Shanghai Jiao Tong University)

Wed-1-7-7 Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification

ZHENYU WANG(UTD), Wei Xia(University of Texas at Dallas) and John H.L. Hansen(Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems)

Wed-1-7-8 Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

Mufan Sang(University of Texas at Dallas), Wei Xia(University of Texas at Dallas) and John H.L. Hansen(Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems)

Wed-1-7-9 JukeBox: A Multilingual Singer Recognition Dataset

Anurag Chowdhury(Michigan State University), Austin Cozzo(Michigan State University) and Arun Ross(Michigan State University)

Wed-1-7-10 Speaker Identification for Household Scenarios with Self-attention and Adversarial Training

Ruirui Li(Amazon), Jyun-Yu Jiang(University of California, Los Angeles), Xian Wu(University of Notre Dame), Chu-Cheng Hsieh(Amazon) and Andreas Stolcke(Amazon)

Applications of ASR   Video

Wed-1-8-1 Streaming keyword spotting on mobile devices

Oleg Rybakov(Google), Natasha Kononenko(Google), Niranjan Subrahmanya(Google), Mirko Visontai(Google Inc) and Stella Laurenzo(Google)

Wed-1-8-2 Metadata-Aware End-to-End Keyword Spotting

Hongyi Liu(Amazon), Apurva Abhyankar(Amazon), Yuriy Mishchenko(Amazon), Thibaud Sénéchal(Amazon), Gengshen Fu(Amazon), Brian Kulis(Amazon), Noah Stein(Amazon), Anish Shah(Amazon) and Shiv Naga Prasad Vitaladevuni(Amazon)

Wed-1-8-3 Adversarial Audio: A New Information Hiding Method

Yehao Kong(Hunan University) and Jiliang Zhang(Hunan University)

Wed-1-8-4 S2IGAN: Speech-to-Image Generation via Adversarial Learning

Xinsheng Wang(Xi'an Jiaotong University), Tingting Qiao(Zhejiang University), Jihua Zhu(Xi’an Jiaotong University), Alan Hanjalic(Delft University of Technology) and Odette Scharenborg(Multimedia computing, Delft University of Technology)

Wed-1-8-5 Automatic Speech Recognition Benchmark for Air-Traffic Communications

Juan Pablo Zuluaga(Idiap Research Institute), Petr Motlicek(Idiap Research Institute, Martigny), Qingran Zhan(School of Information and Electronics, Beijing Institute of Technology, Beijing), Karel Vesely(Brno University of Technology Speech@FIT and IT4I Center of Excellence, Brno) and Rudolf Braun(Idiap Research Institute, Martigny)

Wed-1-8-6 Whisper Augmented End-to-End/Hybrid Speech Recognition System - CycleGAN Approach

Prithvi Raj Reddy Gudepu(Samsung Research Institute Bangalore), Gowtham Prudhvi Vadisetti(Samsung Research Institute Bangalore), Abhishek Niranjan(Samsung Research Institute, Bangalore - India), Kinnera Saranu(Samsung R&D Institute, Bangalore), Raghava Sarma(Samsung Research Institute Bangalore), Mahaboob Ali Basha Shaik(Voice Intelligence, Samsung R&D Institute) and periyasamy Paramasivam(Samsung)

Wed-1-8-7 Risk Forecasting from Earnings Calls Acoustics and Network Correlations

Ramit Sawhney(Netaji Subhas Institute of Technology), Arshiya Aggarwal(Delhi Technological University), Piyush Khanna(Delhi Technological University), Puneet Mathur(University of Maryland College Park), Taru Jain(GGSIPU) and Rajiv Ratn Shah(IIIT Delhi)

Wed-1-8-8 SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems

Huili Chen(University of California, San Diego), Bita Darvish Rouhani(Microsoft Research) and Farinaz Koushanfar(University of California San Diego)

Wed-1-8-9 Evaluating Automatically Generated Phoneme Captions for Images

Justin van der Hout(Multimedia Computing Group, Delft University of Technology), Zoltán D’Haese(KU Leaven), Mark Hasegawa-Johnson(University of Illinois) and Odette Scharenborg(Multimedia computing, Delft University of Technology)

Speech Emotion Recognition II (SER II)   Video

Wed-1-9-1 An Efficient Temporal Modeling Approach for Speech Emotion Recognition by Mapping Varied Duration Sentences into Fixed Number of Chunks

Wei-Cheng Lin(University of Texas at Dallas) and Carlos Busso(The University of Texas at Dallas)

Wed-1-9-2 Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

Siddique Latif(University of Southern Queensland Australia), Rajib Rana(University of Southern Queensland), Sara Khalifa(Distributed Sensing Systems Group, Data61, CSIRO Australia), Raja Jurdak(Queensland University of Technology (QUT)) and Björn Schuller(University of Augsburg / Imperial College London)

Wed-1-9-3 META-LEARNING FOR SPEECH EMOTION RECOGNITION CONSIDERING AMBIGUITY OF EMOTION LABELS

Takuya Fujioka(Hitachi, Ltd.), Takeshi Homma(Hitachi, Ltd.) and Kenji Nagamatsu(Hitachi, Ltd.)

Wed-1-9-4 Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent Representation

Jiaxing Liu(Tianjin University), Zhilei Liu(Tianjin University), Longbiao Wang(Tianjin University), Yuan Gao(Tianjin University), Lili Guo(Tianjin University) and Jianwu Dang(JAIST)

Wed-1-9-6 Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks

Zheng Lian(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Bin Liu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Jian Huang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Zhanlei Yang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing) and Rongjun Li(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing)

Wed-1-9-7 EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification

Shuiyang Mao(The Chinese University of Hong Kong), P. C. Ching(The Chinese University of Hong Kong) and Tan Lee(The Chinese University of Hong Kong)

Wed-1-9-8 Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Classification

Shuiyang Mao(The Chinese University of Hong Kong), P. C. Ching(The Chinese University of Hong Kong), C.-C. Jay Kuo(University of Southern California) and Tan Lee(The Chinese University of Hong Kong)

Bi- and Multilinguality   Video

Wed-1-10-1 The effect of language proficiency on the perception of segmental foreign accent

Rubén Pérez-Ramón (University of the Basque Country), María Luisa García Lecumberri (University of the Basque Country), Martin Cooke (Ikerbasque, University of the Basque Country)

Wed-1-10-2 The effect of language dominance on the selective attention of segments and tones in Urdu-Cantonese speakers

Yi Liu(The Hong Kong Polytechnic University) and Jinghong Ning(The Hong Kong Polytechnic University)

Wed-1-10-3 The Effect of Input on the Production of English Tense and Lax Vowels by Chinese Learners: Evidence from an Elementary School in China

Mengrou Li(Nanjing University of Science and Technology), Ying Chen(Nanjing University of Science and Technology) and Jie Cui(Nanjing University of Science and Technology)

Wed-1-10-4 Exploring the use of an artificial accent of English to assess phonetic learning in monolingual and bilingual speakers

Laura Spinu(City University of New York - Kingsborough Community College), Jiwon Hwang(Stony Brook University), Nadya Pincus(University of Delaware) and Mariana Vasilita(Brooklyn College - CUNY)

Wed-1-10-5 Effects of Dialectal Code-Switching on Speech Modules: A Study using Egyptian Arabic Broadcast Speech

Shammur Absar Chowdhury(University of Trento), Younes Samih(Qatar Computing Research Institute), MOHAMED ELDESOUKI(Concordia University) and Ahmed Ali(Qatar Computing Research Institute)

Wed-1-10-6 Bilingual acoustic voice variation is similarly structured across languages

Khia A. Johnson(University of British Columbia), Molly Babel(University of British Columbia) and Robert A. Fuhrman(University of British Columbia)

Wed-1-10-7 Monolingual Data Selection Analysis for English-Mandarin Hybrid Code-switching Speech Recognition

Haobo Zhang(School of Information Science and Engineering, Xinjiang University, Urumqi), Haihua Xu(Temasek Laboratories, Nanyang Technological University, Singapore), Van Tung Pham(Temasek Laboratories, Nanyang Technological University, Singapore), Hao Huang(School of Information Science and Engineering, Xinjiang University, Urumqi) and Eng Siong Chng(Temasek Laboratories, Nanyang Technological University, Singapore)

Wed-1-10-8 Perception and Production of Mandarin Initial Stops by Native Urdu Speakers

Dan Du(Beijing Language and Culture University), Xianjin Zhu(Harbin Institute of Technology), Zhu Li(Beijing Language and Culture University) and Jinsong Zhang(Beijing Language and Culture University)

Wed-1-10-9 Now you’re speaking my language: Visual language identification

Triantafyllos Afouras(University of Oxford), Joon Son Chung(University of Oxford) and Andrew Zisserman(University of Oxford)

Wed-1-10-10 The different enhancement roles of covarying cues in Thai and Mandarin tones

Nari Rhee(University of Pennsylvania) and Jianjing Kuang(University of Pennsylvania)

Single-Channel Speech Enhancement I   Video

Wed-1-11-1 Singing Voice Extraction with Attention based Spectrograms Fusion

Hao Shi(Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University), Longbiao Wang(Tianjin University), Sheng Li(National Institute of Information and Communications Technology (NICT), Advanced Speech Technology Laboratory), Chenchen Ding(NICT), Meng Ge(Tianjin University), Nan Li(Tianjin University), Jianwu Dang(JAIST) and Hiroshi Seki(Huiyan Technology (Tianjin) Co. Ltd., Tianjin)

Wed-1-11-2 Incorporating Broad Phonetic Information for Speech Enhancement

Yen-Ju Lu(Academic Sinica), Chien-Feng Liao(Academia Sinica), Xugang Lu(NICT), Jeih-weih Hung(National Chi Nan University) and Yu Tsao(Academia Sinica)

Wed-1-11-3 A Recursive Network with Dynamic Attention for Monaural Speech Enhancement

Andong Li(Institute of Acoustics, Chinese Academy of Sciences), Chengshi Zheng(Institute of Acoustics, Chinese Academy of Sciences), Cunhang Fan(Institute of Automation, Chinese Academy of Sciences), Renhua Peng(Institute of Acoustics, Chinese Academy of Sciences) and Xiaodong Li(Institute of Acoustics, Chinese Academy of Sciences)

Wed-1-11-4 Constrained Ratio Mask for Speech Enhancement Using DNN

Hongjiang Yu(Dept. of Electrical and Computer Engineering, Concordia University), Wei-Ping Zhu(Concordia University) and Yuhong Yang(Wuhan university)

Wed-1-11-5 SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning

Chi-Chang Lee(Department of Computer Science and Information Engineering, National Taiwan University), Yu-Chen Lin(Department of Computer Science and Information Engineering, National Taiwan University), Hsuan-Tien Lin(Department of Computer Science and Information Engineering, National Taiwan University), Hsin-Min Wang(Academia Sinica) and Yu Tsao(Academia Sinica)

Wed-1-11-6 Adaptive Neural Speech Enhancement with a Denoising Variational Autoencoder

Yoshiaki Bando(National Institute of Advanced Industrial Science and Technology / RIKEN), Kouhei Sekiguchi(RIKEN / Kyoto University) and Kazuyoshi Yoshii(RIKEN / Kyoto University)

Wed-1-11-7 Low-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks

Ahmet E. Bulut(Center for Robust Speech Systems, University of Texas at Dallas) and Kazuhito Koishida(Microsoft Corporation)

Wed-1-11-8 Single-channel speech enhancement by subspace affinity minimization

Dung N. Tran (Microsoft), Kazuhito Koishida (Microsoft)

Wed-1-11-9 Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement

Haoyu Li(National Institute of Informatics) and Junichi Yamagishi(National Institute of Informatics)

Wed-1-11-10 NAAGN: Noise-aware Attention-gated Network for Speech Enhancement

Feng Deng(Kuai Shou Technology Co.), Tao Jiang(Kuai Shou Technology Co.), Xiao-Rui Wang(Kuai Shou Technology Co.), Chen Zhang(Kuai Shou Technology Co.) and Yan Li(Kuai Shou Technology Co.)

Deep Noise Suppression Challenge   Video

Wed-SS-1-12-1 Online Monaural Speech Enhancement Using Delayed Subband LSTM

Xiaofei Li(Westlake University) and Radu Horaud(Inria)

Wed-SS-1-12-2 INTERSPEECH 2020 Deep Noise Suppression Challenge: A Fully Convolutional Recurrent Network (FCRN) for Joint Dereverberation and Denoising

Maximilian Strake(Technische Universität Braunschweig, Institute for Communications Technology), Bruno Defraene(Goodix Technology (Belgium) BV), Kristoff Fluyt(Goodix Technology (Belgium) BV), Wouter Tirry(Goodix Technology (Belgium) BV) and Tim Fingscheidt(Technische Universität Braunschweig, Institute for Communications Technology)

Wed-SS-1-12-3 DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

Yanxin Hu(Northwestern Polytechnical University), Yun Liu(Sogou), Shubo Lv(Northwestern Polytechnical University), Mengtao Xing(Northwestern Polytechnical University), Shimin Zhang(Northwestern Polytechnical University), Yihui Fu(Northwestern Polytechnical University), Jian Wu(Northwestern Polytechnical University), Bihong Zhang(Sogou) and lei xie(School of Computer Science, Northwestern Polytechnical University)

Wed-SS-1-12-4 Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression

Nils Laurens Westhausen(Carl von Ossietzky University) and Bernd T. Meyer(Carl von Ossietzky University)

Wed-SS-1-12-5 A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

Jean-Marc Valin(Amazon Web Services), Umut Isik(Amazon Web Services), Neerad Phansalkar(Amazon Web Services), Ritwik Giri(Amazon Web Services), Karim Helwani(Amazon Web Services) and Arvindh Krishnaswamy(Amazon Web Services)

Wed-SS-1-12-6 PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss

Umut Isik(Amazon Web Services), Ritwik Giri(Amazon Web Services), Neerad Phansalkar(Amazon Web Services), Jean-Marc Valin(Amazon Web Services), Karim Helwani(Amazon Web Services) and Arvindh Krishnaswamy(Amazon Web Services)

Wed-SS-1-12-7 The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

Chandan Karadagur Ananda Reddy(Microsoft), Vishak Gopal(Senior Software Engineer), Ross Cutler(Microsoft), Ebrahim Beyrami(Microsoft Corporation), Roger Cheng(Senior Software Engineer), Harishchandra Dubey(Microsoft Corporation), Sergiy Matusevych(Principal Data Scientist), Robert Aichner(Principal Program Manager), Ashkan Aazami(Senior Data Scientist), Sebastian Braun(Senior Researcher), Puneet Rana(Senior Software Engineer), Sriram Srinivasan(Principal Software Engineering Manager) and Johannes Gehrke(Technical Fellow)

Wednesday 20:30-21:30(GMT+8), October 28

Voice and Hearing Disorders   Video

Wed-2-1-1 The Implication of Sound Level on Spatial Selective Auditory Attention for Cochlear Implant Users: Behavioral and Electrophysiological Measurement

Sara Akbarzadeh(University of Texas at Dallas), Sungmin Lee(Tongmyong University) and Chin-Tuan Tan(University of Texas at Dallas)

Wed-2-1-2 Enhancing the Interaural Time Difference of Bilateral Cochlear Implants with the Temporal Limits Encoder

Yangyang Wan(Shenzhen University), Huali Zhou(South China University of Technology), Qinglin Meng(South China University of Technology) and Nengheng Zheng(Shenzhen University)

Wed-2-1-3 Speech clarity improvement by vocal self-training using a hearing impairment simulator and its correlation with an auditory modulation index

Toshio Irino(Wakayama University), Soichi Higashiyama(Wakayama University) and Hanako Yoshigi(Wakayama University)

Wed-2-1-4 Investigation of Phase Distortion on Perceived Speech Quality for Hearing-impaired Listeners

Zhuohuang Zhang(Indiana University Bloomington), Donald S. Williamson(Indiana University Bloomington) and Yi Shen(Indiana University Bloomington)

Wed-2-1-5 EEG-based Short-time Auditory Attention Detection using Multi-task Deep Learning

Zhuo Zhang(Tianjin University), Gaoyan Zhang(Tianjin University), Jianwu Dang(JAIST), Shuang Wu(Tianjin University), Di Zhou(Japan Advanced Institute of Science and Technology) and Longbiao Wang(Tianjin University)

Wed-2-1-6 Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders. Step 1 : CNN model-based phone classification

Sondes Abderrazek(LIA, Avignon University) and Virginie Woisard(UT2J, Octogone-Lordat, Toulouse University and Toulouse Hospital)

Wed-2-1-7 Improving cognitive impairment classification by generative neural network-based feature augmentation

Bahman Mirheidari(Department of Computer Science, University of Sheffield), Daniel Blackburn(Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK), Ronan O'Malley(Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK), Annalena Venneri(Academic Neurology Unit, University of Sheffield, Royal Hallamshire Hospital, Sheffield, UK), Traci Walker(Department of Human Communication Sciences, University of Sheffield, Sheffield, UK), Markus Reuber(Academic Neurology Unit, University of Sheffield, Royal Hallamshire Hospital, Sheffield, UK) and Heidi Christensen(University of Sheffield)

Wed-2-1-8 UncommonVoice: A Crowdsourced Dataset of Dysphonic Speech

Meredith Moore(Arizona State University), Piyush Papreja(Arizona State University), Michael Saxon(Arizona State University), Visar Berisha(Arizona State University) and Sethuraman Panchanathan(Arizona State University)

Wed-2-1-9 Towards automatic assessment of voice disorders: A clinical approach

Purva Barche(International Institute of Information and Technology-Hyderabad), Krishna Gurugubelli(Research scholar, Speech Processing Laboratory, LTRC, KCIS, International Institute of Information Technology, Hyderabad) and Anil Kumar Vuppala(IIIT Hyderabad)

Wed-2-1-10 BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

Abhishek Shivkumar(Novoic Ltd), Jack Weston(Novoic Ltd), Raphael Lenain(Novoic Ltd) and Emil Fristed(Novoic Ltd)

Spoken Term Detection   Video

Wed-2-2-1 Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting

Menglong Xu(Northwestern Polytechnical University) and Xiao-Lei Zhang(Northwestern Polytechnical University)

Wed-2-2-3 Deep Convolutional Spiking Neural Networks for Keyword Spotting

Emre Yilmaz(National University of Singapore), Ozgur Bora Gevrek(National University of Singapore), Jibin Wu(National University of Singapore), Yuxiang Chen(National University of Singapore), Xuanbo Meng(National University of Singapore) and Haizhou Li(National University of Singapore)

Wed-2-2-4 Domain Aware Training for Far-field Small-footprint Keyword Spotting

Haiwei Wu(Duke Kunshan University), Yan Jia(Duke Kunshan University), Yuanfei Nie(Montage Technology, Kunshan) and Ming Li(Duke Kunshan University)

Wed-2-2-5 Re-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting

Kun Zhang(Graduate School at Shenzhen, Tsinghua University), Zhiyong Wu(Tsinghua University), Daode Yuan(Xiaoice, Microsoft), Jian Luan(Microsoft), JIA JIA(Tsinghua University), Helen Meng(The Chinese University of Hong Kong) and Binheng Song(Graduate School at Shenzhen, Tsinghua University)

Wed-2-2-6 Deep Template Matching for Small-footprint and Configurable Keyword Spotting

Peng Zhang(Inner Mongolia University) and Xueliang Zhang(Inner Mongolia University)

Wed-2-2-7 Multi-scale Convolution for Robust Keyword Spotting

Chen Yang(Samsung Research China-Beijing(SRC-B)), Xue Wen(Samsung Research China-Beijing(SRC-B)) and Liming Song(Samsung Research China-Beijing(SRC-B))

Wed-2-2-8 An Investigation of Few-Shot Learning in Spoken Term Classification

Yangbin Chen(City University of Hong Kong), Tom Ko(South University of Science and Technology), Lifeng Shang(Huawei Noah's Ark Lab), Xiao Chen(Huawei Noah's Ark Lab), Xin Jiang(Huawei Noah's Ark Lab) and Qing Li(The Hong Kong Polytechnic University)

Wed-2-2-9 End-to-End Keyword Search Based on Attention and Energy Scorer for Low Resource Languages

Zeyu Zhao(Tsinghua University) and Wei-Qiang Zhang(Tsinghua University)

Wed-2-2-10 Stacked 1D convolutional networks for end-to-end small footprint voice trigger detection

Takuya Higuchi(Apple), Mohammad Ghasemzadeh(Apple), Kisun You(Apple) and Chandra Dhir(Apple)

The Fearless Steps Challenge Phase-02   Video

Wed-SS-2-3-1 Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments

Jens Heitkaemper(Paderborn University), Joerg Schmalenstroeer(University of Paderborn, Department of Communications Enginieering) and Reinhold Haeb-Umbach(Paderborn University)

Wed-SS-2-3-2 Speaker Diarization System based on DPCA Algorithm For Fearless Steps Challenge Phase-2

XueShuai Zhang(Institute of Acoustics, Chinese Academy of Sciences), Wenchao Wang(Institute of Acoustics, Chinese Academy of Sciences) and pengyuan zhang(Institute of Acoustics, Chinese Academy of Sciences)

Wed-SS-2-3-3 The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02

Qingjian Lin(SEIT, Sun Yat-sen University), Tingle Li(Duke Kunshan University) and Ming Li(Duke Kunshan University)

Wed-SS-2-3-4 "This is Houston. Say again, please''. The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II).

Arseniy Gorin(Behavox Limited), Daniil Kulko(Behavox Limited), Steven Grima(Behavox Limited) and Alex Glasman(Behavox Limited)

Wed-SS-2-3-5 FEARLESS STEPS Challenge (FS-2): Supervised Learning with Massive Naturalistic Apollo Data

Aditya Joglekar(The University of Texas at Dallas; CRSS - Center for Robust Speech Systems), John H.L. Hansen(Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems), Meena Chandra Shekar(The University of Texas at Dallas) and Abhijeet Sangwan(Center for Robust Speech Systems, The University of Texas at Dallas)

Monaural Source Separation   Video

Wed-2-4-1 Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss

Yi Luo(Columbia University) and Nima Mesgarani(Columbia University)

Wed-2-4-2 On Synthesis for Supervised Monaural Speech Separation in Time Domain

Jingjing Chen(School of Computer Science and Communication Engineering, Jiangsu University), Qirong Mao(School of Computer Science and Communication Engineering, Jiangsu University; Jiangsu Key Laboratory of Security Tech. for Industrail Cyberspace) and Dong Liu(School of Computer Science and Communication Engineering, Jiangsu University)

Wed-2-4-4 Asteroid : the PyTorch-based audio source separation toolkit for researchers

Manuel Pariente(Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France), Samuele Cornell(Università Politecnica delle Marche), Joris Cosentino(Inria), Sunit Sivasankaran(INRIA), Efthymios Tzinis(University of Illinois at Urbana-Champaign), Jens Heitkaemper(Paderborn University), Michel Olvera(Université de Lorraine), Fabian-Robert Stöter(Inria and LIRMM, University of Montpellier), Mathieu Hu(Inria), Juan M. Martín-Doñas(University of Granada), David Ditter(University of Hamburg), Ariel Frank(Technion - Israel Institute of Technology), Antoine Deleforge(INRIA) and Emmanuel Vincent(INRIA)

Wed-2-4-5 Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

Jingjing Chen(School of Computer Science and Communication Engineering, Jiangsu University), Qirong Mao(School of Computer Science and Communication Engineering, Jiangsu University; Jiangsu Key Laboratory of Security Tech. for Industrail Cyberspace) and Dong Liu(School of Computer Science and Communication Engineering, Jiangsu University)

Wed-2-4-6 Conv-TasSAN: Separative Adversarial Network based on Conv-TasNet

Chengyun Deng(Didi Chuxing), Yi Zhang(Didi Chuxing), Shiqian Ma(Didi Chuxing), Yongtao Sha(Didi Chuxing), Hui Song(Didi Chuxing) and Xiangang Li(Didi Chuxing)

Wed-2-4-7 Multi-path RNN for hierarchical modeling of long sequential data and its application to speaker stream separation

Keisuke Kinoshita(NTT), Thilo von Neumann(Paderborn University), Marc Delcroix(NTT Communication Science Laboratories), Tomohiro Nakatani(NTT Corporation) and Reinhold Haeb-Umbach(Paderborn University)

Wed-2-4-8 Unsupervised Audio Source Separation using Generative Priors

Vivek Sivaraman Narayanaswamy(Arizona State University), Jayaraman J. Thiagarajan(Lawrence Livermore National Labs), Rushil Anirudh(Lawrence Livermore National Labs) and Andreas Spanias(Arizona State University)

Single-Channel Speech Enhancement II   Video

Wed-2-5-1 Adversarial Latent Representation Learning for Speech Enhancement

Yuanhang Qiu(Massey University) and Ruili Wang(Massey University)

Wed-2-5-2 An NMF-HMM Speech Enhancement Method based on Kullback-Leibler Divergence

Yang Xiang(Aalborg University & Capturi), Liming Shi(Aalborg University), Jesper Lisby Højvang(Capturi A/S), Morten Højfeldt Rasmussen(Capturi A/S) and Mads Græsbøll Christensen(Aalborg University)

Wed-2-5-3 Multi-Scale TCN: Exploring Better Temporal DNN Model for Causal Speech Enhancement

Lu Zhang(Harbin Institute of Technology, Shenzhen) and mingjiang wang(Harbin Institute of Technology)

Wed-2-5-4 VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

Quan Wang(Google Inc.), Ignacio Lopez Moreno(Google Inc.), Mert Saglam(Google Inc.), Kelvin Wilson(Google Inc.), Alan Chiao(Google Inc.), Renjie Liu(Google Inc.), Yanzhang He(Google Inc.), Wei Li(Google Inc.), Jason Pelecanos(Google Inc.), Marily Nika(Google Inc.) and Alex Gruenstein(Google Inc.)

Wed-2-5-5 Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Ziqiang Shi(Fujitsu Research and Development Center), Rujie Liu(Fujitsu Research and Development Center) and Jiqing Han(Harbin Institute of Technology)

Wed-2-5-6 Sub-band Knowledge Distillation Framework for Speech Enhancement

Xiang Hao(Inner Mongolia University), Shixue Wen(Sogou Inc.), Xiangdong Su(Inner Mongolia University), Yun Liu(Sogou), Guanglai Gao() and Xiaofei Li(Westlake University)

Wed-2-5-7 A Deep Learning-based Kalman Filter for Speech Enhancement

Sujan Kumar Roy(Signal Processing Laboratory, Griffith School of Engineering, Griffith University, Brisbane, QLD, Australia, 4111), Aaron Nicolson(Signal Processing Laboratory, Griffith School of Engineering, Griffith University, Brisbane, QLD, Australia, 4111) and Kuldip K. Paliwal(Signal Processing Laboratory, Griffith School of Engineering, Griffith University, Brisbane, QLD, Australia, 4111)

Wed-2-5-8 Subband Kalman Filtering with DNN Estimated Parameters for Speech Enhancement

Hongjiang Yu(Dept. of Electrical and Computer Engineering, Concordia University), Wei-Ping Zhu(Concordia University) and Benoit Champagne(McGill University)

Wed-2-5-9 Bidirectional LSTM Network with Ordered Neurons for Speech Enhancement

Xiaoqi Li(Wuhan University of Technology), Yaxing Li(School of Computer Science and Technology, Wuhan University of Technology), Yuanjie Dong(Wuhan University of Technology), Shan Xu(Wuhan University of Technology), Zhihui Zhang(Wuhan University of Technology), Dan Wang(Wuhan University of Technology) and Shengwu Xiong(Wuhan University of Technology)

Wed-2-5-10 Speaker-conditional Chain Model for Speech Separation and Extraction

Jing Shi(Institute of Automation, Chinese Academy of Sciences.), jiaming xu(Institute of Automation, Chinese Academy of Sciences), Yusuke Fujita(Hitachi, Ltd.), Shinji Watanabe(Johns Hopkins University) and Bo Xu(Institute of Automation, Chinese Academy of Sciences)

Topics in ASR II   Video

Wed-2-6-1 Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images

Leanne Nortje(Stellenbosch University) and Herman Kamper(Stellenbosch University)

Wed-2-6-2 Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text

Yoonhyung Lee(Seoul National University), Seunghyun Yoon(Seoul National University) and Kyomin Jung(Seoul National University)

Wed-2-6-3 Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

Tamás Gábor Csapó(Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics)

Wed-2-6-4 Ultrasound-based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis

Tamás Gábor Csapó(Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics), Csaba Zainkó(Budapest University of Technology and Economics-TMIT), László Tóth(MTA-SZTE Research Group on Artificial Intelligence), Gábor Gosztolya(Research Group on Artificial Intelligence) and Alexandra Markó(Eötvös Loránd University, MTA-ELTE Lendület Lingual Articulation Research Group)

Wed-2-6-5 Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Siyuan Feng(Delft University of Technology) and Odette Scharenborg(Multimedia computing, Delft University of Technology)

Wed-2-6-6 Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

Kohei Matsuura(Kyoto University), Masato Mimura(Kyoto University), Shinsuke Sakai(Kyoto University) and Tatsuya Kawahara(Kyoto University)

Wed-2-6-7 Neural Speech Completion

Kazuki Tsunematsu(Nara Institute of Science and Technology (NAIST)), Johanes Effendi(Nara Institute of Science and Technology (NAIST) / RIKEN AIP), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Satoshi Nakamura(Nara Institute of Science and Technology and RIKEN AIP Center)

Wed-2-6-8 Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization

Benjamin Milde(Universität Hamburg) and Chris Biemann(Universität Hamburg)

Wed-2-6-9 Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning

Katerina Papadimitriou(Electrical and Computer Engineering Department, University of Thessaly) and Gerasimos Potamianos(Electrical and Computer Engineering Department, University of Thessaly)

Wed-2-6-10 MLS: A Large-Scale Multilingual Dataset for Speech Research

Vineel Pratap(Facebook), Qiantong Xu(Facebook AI Research), Anuroop Sriram(Facebook AI), Gabriel Synnaeve(Facebook AI Research) and Ronan Collobert(Facebook AI Research)

Neural Signals for Spoken Communication   Video

Wed-SS-2-7-1 Combining Audio and Brain Activity for Predicting Speech Quality

Ivan Halim Parmonangan(Nara Institute of Science and Technology), Hiroki Tanaka(Nara Institute of Science and Technology), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Satoshi Nakamura(Nara Institute of Science and Technology)

Wed-SS-2-7-2 The "Sound of Silence" in EEG - Cognitive voice activity detection

Rini A Sharon(Indian Institute of Technology, Madras) and Hema Murthy(IIT Madras)

Wed-SS-2-7-3 Low Latency Auditory Attention Detection with Common Spatial PatternAnalysis of EEG Signals

Siqi Cai(National University of Singapore), Enze Su(South China University of Technology), Yonghao Song(South China University of Technology), Longhan Xie(South China University of Technology) and Haizhou Li(National University of Singapore)

Wed-SS-2-7-4 Speech Spectrogram Estimation from Intracranial Brain Activity using a Quantization Approach

Miguel Angrick(University of Bremen), Christian Herff(Maastricht University), Garett Johnson(Old Dominion University), Jerry Shih(UC San Diego Health), Dean Krusienski(Virginia Commonwealth University) and Tanja Schultz(University of Bremen)

Wed-SS-2-7-5 Neural Speech Decoding for Amyotrophic Lateral Sclerosis

Debadatta Dash(The University of Texas at Austin), Paul Ferrari(The University of Texas at Austin), Angel Hernandez(Division of Neurosciences, Helen DeVos Children’s Hospital), Daragh Heitzman(MDA/ALS Center, Texas Neurology), Sara Austin(The University of Texas at Austin) and Jun Wang(University of Texas at Austin)

Training Strategies for ASR   Video

Wed-2-8-1 Semi-supervised ASR by End-to-end Self-training

Yang Chen(The Ohio State University), Weiran Wang(Salesforce Research) and Chao Wang(Amazon Alexa)

Wed-2-8-2 Improved training strategies for end-to-end speech recognition in digital voice assistants

Hitesh Tulsiani(Amazon), Ashtosh Sapru(amazon), Harish Arsikere(Amazon), Surabhi Punjabi(Amazon) and Sri Garimella(Amazon)

Wed-2-8-3 Serialized Output Training for End-to-End Overlapped Speech Recognition

Naoyuki Kanda(Microsoft), Yashesh Gaur(Microsoft), Xiaofei Wang(Microsoft), Zhong Meng(Microsoft) and Takuya Yoshioka(Microsoft)

Wed-2-8-4 Semi-Supervised Learning with Data Augmentation for End-to-End ASR

Felix Weninger(Nuance Communications), Franco Mana(Nuance Communications), Roberto Gemello(Nuance Communications), Jesús Andrés-Ferrer(Nuance Communications) and Puming Zhan(Nuance Communications)

Wed-2-8-5 Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

Jinxi Guo(Amazon.com), Gautam Tiwari(Amazon.com), Jasha Droppo(Amazon.com), Maarten Van Segbroeck(Amazon.com), Che-Wei Huang(Amazon.com), Andreas Stolcke(Amazon.com) and Roland Maas(Amazon.com)

Wed-2-8-6 A New Training Pipeline for an Improved Neural Transducer

Albert Zeyer(Human Language Technology and Pattern Recognition Group (Chair of Computer Science 6), Computer Science Department, RWTH Aachen University), André Merboldt(RWTH Aachen University), Ralf Schlüter(Lehrstuhl Informatik 6, RWTH Aachen University) and Hermann Ney(RWTH Aachen University)

Wed-2-8-7 Improved Noisy Student Training for Automatic Speech Recognition

Daniel Park(Google Brain), Yu Zhang(Google Brain), Ye Jia(Google), Wei Han(Google), Chung-Cheng Chiu(Google), Bo Li(Google), Yonghui Wu(Google Brain) and Quoc Le(Google Brain)

Wed-2-8-8 Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

Ryo Masumura(NTT Corporation), Naoki Makishima(NTT Corporation), Mana Ihori(NTT Corporation), Akihiko Takashima(NTT Corporation), Tomohiro Tanaka(NTT Corporation) and Shota Orihashi(NTT Corporation)

Wed-2-8-9 Utterance invariant training for hybrid two-pass end-to-end speech recognition

Dhananjaya Gowda(Samsung Research), Abhinav Garg(SR), Ankur Kumar(SRIB), Kwangyoun Kim(Samsung Electronics), Jiyeon Kim(Samsung), Sachin Singh(SRIB), Mehul Kumar(Samsung), Shatrughan Singh(SRIB) and Chanwoo Kim(Samsung Research)

Wed-2-8-10 SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR

Gary Wang(Simon Fraser University), Andrew Rosenberg(Google LLC), Zhehuai Chen(Google), Yu Zhang(Google), Bhuvana Ramabhadran(Google) and Pedro Moreno(Google)

Speech Transmission & Coding   Video

Wed-2-9-1 Fundamental Frequency Model for Postfiltering at Low Bitrates in a Transform-Domain Speech and Audio Codec

sneha das(Aalto University), Tom Bäckström(Aalto University) and Guillaume Fuchs(Fraunhofer IIS)

Wed-2-9-2 Hearing-Impaired Bio-Inspired Cochlear Models for Real-Time Auditory Applications

Arthur Van Den Broucke(Ghent University), Deepak Baby(Idiap Research Institute) and Sarah Verhulst(Ghent University)

Wed-2-9-3 Improving Opus Low Bit Rate Quality with Neural Speech Synthesis

Jan Skoglund(Google) and Jean-Marc Valin(Amazon Web Services)

Wed-2-9-4 A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

Pranay Manocha(Princeton University), Adam Finkelstein(Princeton University), Richard Zhang(Adobe Research), Nicholas Bryan(Adobe Research), Gautham Mysore(Adobe Research) and Zeyu Jin(Adobe Research)

Wed-2-9-5 StoRIR: Stochastic Room Impulse Response Generation for Audio Data Augmentation

Piotr Masztalski(Samsung R&D Institute Poland), Mateusz Matuszewski(Samsung R&D Institute Poland), Karol Piaskowski(Samsung R&D Institute Poland) and Michał Romaniuk(Samsung R&D Institute Poland)

Wed-2-9-6 An Open source Implementation of ITU-T Recommendation P.808 with Validation

Babak Naderi(Technische Universität Berlin) and Ross Cutler(Microsoft)

Wed-2-9-7 DNN No-Reference PSTN Speech Quality Prediction

Gabriel Mittag(Technische Universität Berlin), Ross Cutler(Microsoft), Yasaman Hosseinkashi(Microsoft), Michael Revow(Microsoft), Sriram Srinivasan(Microsoft), Naglakshmi Chande(Microsoft) and Robert Aichner(Microsoft)

Wed-2-9-8 Non-intrusive Diagnostic Monitoring of Fullband Speech Quality

Sebastian Möller(Quality and Usability Lab, TU Berlin), Tobias Hübschen(Digital Signal Processing and System Theory, Christian-Albrechts-Universität zu Kiel), Thilo Michael(Quality and Usability Lab, Technische Universität Berlin), Gabriel Mittag(Technische Universität Berlin) and Gerhard Schmidt(Kiel University)

Bioacoustics and Articulation   Video

Wed-2-10-1 Transfer learning of articulatory information through phone information

Abdolreza Sabzi Shahrebabaki(Norwegian university of science and technology), Sabato Marco Siniscalchi(University of Enna Kore), Negar Olfati(NTNU, Department of Electronic Systems), Giampiero Salvi(NTNU, Department of Electronic Systems) and Torbjørn Svendsen(NTNU, Department of Electronic Systems)

Wed-2-10-2 Sequence-to-sequence articulatory inversion through time convolution of sub-band frequency signals

Abdolreza Sabzi Shahrebabaki(Norwegian university of science and technology), Sabato Marco Siniscalchi(University of Enna Kore), Giampiero Salvi(NTNU, Department of Electronic Systems) and Torbjørn Svendsen(NTNT, Department of Electronic Systems)

Wed-2-10-3 Discriminative Singular Spectrum Analysis for bioacoustic classification

Bernardo Gatto(Center for Artificial Intelligence Research), Eulanda Santos(Federal University of Amazonas), Juan Colonna(Federal University of Amazonas), Naoya Sogi(University of Tsukuba), Lincon Souza(University of Tsukuba) and Fukui Kazuhiro(University of Tsukuba)

Wed-2-10-4 Speech rate task-specific representation learning from acoustic-articulatory data

Renuka Mannem(Indian Institute of Science), Himajyothi Rajamahendravarapu(Rajiv Gandhi University of Knowledge Technologies, Kadapa), Aravind Illa(PhD Student, Indian Institute of Science, Bangalore) and Prasanta Ghosh(Assistant Professor, EE, IISc)

Wed-2-10-5 Dysarthria Detection and Severity Assessment using Rhythm-Based Metrics

Abner Hernandez(Seoul National University), Eun Jung Yeo(Seoul National University), Sunhee Kim(Seoul National University) and Minhwa Chung(Seoul National University)

Wed-2-10-6 LungRN+NL: An Improved Adventitious Lung Sound Classification Using non-local block ResNet Neural Network with Mixup Data Augmentation

Yi Ma(Shanghai Jiao Tong University), Xu Xinzi(Shanghai Jiaotong University) and Yongfu Li(Shanghai Jiao Tong University)

Wed-2-10-7 Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates

Abhayjeet Singh(Research Assistant), Aravind Illa(PhD Student, Indian Institute of Science, Bangalore) and Prasanta Ghosh(Assistant Professor, EE, IISc)

Wed-2-10-8 Adventitious Respiratory Classification using Attentive Residual Neural Networks

Zijiang Yang(Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany), Shuo Liu(Chair of Embedded Intelligence for Health Care and Wellbeing,University of Augsburg, Germany), Meishu Song(Univeristy of Augsburg), Emilia Parada-Cabaleiro(Complutense University of Madrid) and Björn Schuller(University of Augsburg / Imperial College London)

Wed-2-10-9 Surfboard: Audio Feature Extraction for Modern Machine Learning

Raphael Lenain(Novoic Ltd), Jack Weston(Novoic Ltd), Abhishek Shivkumar(Novoic Ltd) and Emil Fristed(Novoic Ltd)

Wed-2-10-10 Whisper activity detection using CNN-LSTM based attention pooling network trained for a speaker identification task

Abinay Reddy Naini(Indian Institute of Science), Satyapriya Malla(Rajiv Gandhi University of Knowledge Technologies,Rk valley) and Prasanta Ghosh(Assistant Professor, EE, IISc)

Speech Synthesis: Multilingual and Cross-Lingual Approaches   Video

Wed-2-11-1 Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

Shengkui Zhao(Machine Intelligence Technology, Alibaba Group), Trung Hieu Nguyen(Machine Intelligence Technology, Alibaba Group), Hao Wang(Machine Intelligence Technology, Alibaba Group) and Bin Ma(Machine Intelligence Technology, Alibaba Group)

Wed-2-11-2 MULTI-LINGUAL MULTI-SPEAKER TEXT-TO-SPEECH SYNTHESIS FOR VOICE CLONING WITH ONLINE SPEAKER ENROLLMENT

Zhaoyu Liu(The Hong Kong University of Science and Technology) and Brian Mak(The Hong Kong University of Science and Technology)

Wed-2-11-3 Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis

Ruibo Fu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Chunyu Qiang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Tao Wang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)

Wed-2-11-4 Phonological features for 0-shot multilingual speech synthesis

Marlene Staib(Papercup), Tian Huey Teh(Papercup Technologies), Alexandra Torresquintero(Papercup Technologies), Devang Savita Ram Mohan(Papercup Technologies), Lorenzo Foglianti(Papercup Technologies), Raphael Lenain(Novoic) and Jiameng Gao(Papercup Technologies)

Wed-2-11-5 Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space

Detai Xin(The University of Tokyo), Yuki Saito(The University of Tokyo), Shinnosuke Takamichi(University of Tokyo), Tomoki Koriyama(The University of Tokyo) and Hiroshi Saruwatari(The University of Tokyo)

Wed-2-11-6 Tone Learning in Low-Resource Bilingual TTS

Ruolan Liu(Samsung Research China-Beijing (SRC-B)), Xue Wen(Samsung Research China-Beijing (SRC-B)), Chunhui Lu(Samsung Research China-Beijing (SRC-B)) and Xiao Chen(Samsung Research China-Beijing (SRC-B))

Wed-2-11-7 On Improving Code Mixed Speech Synthesis with Mixlingual Graphene-to-Phoneme Model

Shubham Bansal(Microsoft), Arijit Mukherjee(Microsoft), Sandeepkumar Satpal(Microsoft) and Rupesh Mehta(Microsoft)

Wed-2-11-8 Generic Indic Text-to-speech Synthesisers with Rapid Adaptation in an End-to-end Framework

Anusha Prakash(Indian Institute of Technology Madras) and Hema Murthy(IIT Madras)

Wed-2-11-9 Efficient neural speech synthesis for low-resource languages through multilingual modeling

Marcel de Korte(ReadSpeaker), Jaebok Kim(ReadSpeaker) and Esther Klabbers(ReadSpeaker)

Wed-2-11-10 One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

Tomáš Nekvinda(Charles University) and Ondřej Dušek(Charles University)

Learning Techniques for Speaker Recognition I   Video

Wed-2-12-1 In defence of metric learning for speaker recognition

Joon Son Chung(University of Oxford), Jaesung Huh(Naver Corporation), Seongkyu Mun(Naver Corp.), Minjae Lee(Naver Corporation), Hee Soo Heo(Naver Corporation), Soyeon Choe(Naver Corporation), Chiheon Ham(Naver Corporation), Sunghwan Jung(Naver Corporation), Bong-Jin Lee(Naver Corporation) and Icksang Han(Naver Corporation)

Wed-2-12-2 Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

Seong Min Kye(KAIST), Youngmoon Jung(KAIST), Hae Beom Lee(KAIST), Sung Ju Hwang(KAIST, AITRICS) and Hoi Rin Kim(KAIST)

Wed-2-12-3 Segment-level Effects of Gender, Nationality and Emotion Information on Text-independent Speaker Verification

Kai Li(Japan advanced institute of science and technology), Masato Akagi(Japan Advanced Institute of Science and Technology), Yibo Wu(Tianjin university) and Jianwu Dang(JAIST)

Wed-2-12-4 Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Yanpei Shi(University of Sheffield), Qiang Huang(University of Sheffield) and Thomas Hain(University of Sheffield)

Wed-2-12-5 Multi-Task Learning for Voice Related Recognition Tasks

Ana Montalvo(Advanced Technologies Application Center), José Ramón Calvo(Advanced Technologies Application Center) and Jean François Bonastre(Université d'Avignon)

Wed-2-12-6 Unsupervised Training of Siamese Networks for Speaker Verification.

Umair Khan(Universitat Politècnica de Catalunya) and Javier Hernando(Universitat Politecnica de Catalunya)

Wed-2-12-7 An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions

Ying Liu(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei), Yan Song(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei), Yiheng Jiang(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei), Ian McLoughlin(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei), Lin Liu(iFLYTEK Research, iFLYTEK CO., LTD., Hefei, Anhui 230088) and Lirong Dai(National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei)

Wed-2-12-8 Speaker-Aware Linear Discriminant Analysis in Speaker Verification

Naijun Zheng(The Chinese University of Hong Kong), Xixin Wu(The Chinese University of Hong Kong), Jinghua Zhong(SpeechX), Xunying Liu(The Chinese University of Hong Kong) and Helen Meng(The Chinese University of Hong Kong)

Wed-2-12-9 Adversarial Domain Adaptation for Speaker Verification using Partially Shared Network

Zhengyang Chen(MoE Key Lab of Artificial Intelligence SpeechLab, Department of Computer Science and EngineeringShanghai Jiao Tong University, Shanghai), Shuai Wang(Shanghai Jiao Tong University) and Yanmin Qian(Shanghai Jiao Tong University)

Wednesday 21:45-22:45(GMT+8), October 28

Students Meet Experts  

 In this event, we will have a panel discussion with experts from academia and industry. You are welcome to submit questions. A selection of the submitted questions will be answered from the panel of experts. To submit your questions and/or register for this event, we ask you to fill in this form Application FormAll students participating in Interspeech 2020 are invited to attend this event!  Location: https://zoom.com.cn/j/66478221605

Pronunciation   Video

Wed-3-1-1 Automatic scoring at multi-granularity for L2 pronunciation

Binghuai Lin(Tencent Technology Co., Ltd), Liyuan Wang(Tencent Technology Co., Ltd), Xiaoli FENG(Center for Studies of Chinese as a Second Language Beijing Language and Culture University) and Jinsong Zhang(Beijing Language and Culture University)

Wed-3-1-2 An Effective End-to-End Modeling Approach for Mispronunciation Detection

Tien-Hong Lo(National Taiwan Normal University), Shi-Yan Weng(National Taiwan Normal Unversity), Hsiu-Jui Chang(National Taiwan Normal University) and Berlin Chen(National Taiwan Normal University)

Wed-3-1-3 An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling

Bi-Cheng Yan(National Taiwan Normal University), Hsiao-Tsung Hung(National Taiwan Normal University) and meng wu(ASUS)

Wed-3-1-5 Pronunciation Erroneous Tendency Detection with Language Adversarial Represent Learning

Longfei Yang(Tokyo Institute of Technology), Kaiqi Fu(Beijing Language and Culture University), Jinsong Zhang(Beijing Language and Culture University) and Takahiro Shinozaki(Tokyo Institute of Technology)

Wed-3-1-6 ASR-Free Pronunciation Assessment

Sitong Cheng(Beijing University of Posts and Telecommunications, Beijing), Zhixin Liu(Beijing University of Posts and Telecommunications, Beijing), Zhiyuan Tang(Tsinghua University, Beijing), Lantian Li(Tsinghua University), Dong Wang(Tsinghua University) and Thomas Fang Zheng(CSLT, Tsinghua University)

Wed-3-1-7 Automatic detection of accent and lexical pronunciation errors in spontaneous non-native English speech

Konstantinos Kyriakopoulos(University of Cambridge), Kate Knill(University of Cambridge) and Mark Gales(Cambridge University)

Wed-3-1-8 Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training

Jiatong Shi(The Johns Hopkins University), Nan Huo(The Johns Hopkins University) and Qin Jin(Renmin University of China)

Diarization   Video

Wed-3-2-1 Partial AUC Optimisation using Recurrent Neural Networks for Music Detection with Limited Training Data

Pablo Gimeno(University of Zaragoza), Victoria Mingote(University of Zaragoza), Alfonso Ortega(University of Zaragoza), Antonio Miguel(University of Zaragoza) and Eduardo Lleida(University of Zaragoza)

Wed-3-2-2 An open-source voice type classifier for child-centered daylong recordings

Marvin Lavechin(Ecole Normale Supérieure - Laboratoire de Sciences Cognitives et Psycholinguistique), Ruben Bousbib(Ecole Normale Supérieure - Laboratoire de Sciences Cognitives et Psycholinguistique), Hervé Bredin(CNRS LIMSI), Emmanuel Dupoux(Ecole des Hautes Etudes en Sciences Sociales) and Alejandrina Cristia(Laboratoire de Sciences Cognitives et Psycholinguistique (ENS, EHESS, CNRS), Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL Research University)

Wed-3-2-3 Competing speaker count estimation on the fusion of the spectral and spatial embedding space

Chao Peng(Peking University), Xihong Wu(Peking University) and Tianshu Qu(Peking University)

Wed-3-2-4 Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework

Xinyuan Qian(National University of Singapore) and Shoufeng Lin(National University of Singapore)

Wed-3-2-5 Towards Speech Robustness for Acoustic Scene Classification

Shuo Liu(Chair of Embedded Intelligence for Health Care and Wellbeing,University of Augsburg, Germany), Andreas Triantafyllopoulos(audEERING GmbH / University of Augsburg), Zhao Ren(University of Augsburg) and Björn Schuller(University of Augsburg / Imperial College London)

Wed-3-2-6 Identify Speakers in Cocktail Parties with End-to-End Attention

Junzhe Zhu(UIUC), Mark Hasegawa-Johnson(UIUC) and Leda Sari(UIUC)

Wed-3-2-7 Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

Thilo von Neumann(Paderborn University), Christoph Boeddeker(Paderborn University), Lukas Drude(Paderborn University), Keisuke Kinoshita(NTT), Marc Delcroix(NTT Communication Science Laboratories), Tomohiro Nakatani(NTT Corporation) and Reinhold Haeb-Umbach(Paderborn University)

Wed-3-2-8 Attentive Convolutional Recurrent Neural Network Using Phoneme-Guided Acoustic Representation for Rare Sound Event Detection

Shreya G. Upadhyay(Department of Electrical Engineering, National Tsing Hua University), Bo-Hao Su(Department of Electrical Engineering, National Tsing Hua University) and Chi-Chun Lee(Department of Electrical Engineering, National Tsing Hua University)

Wed-3-2-9 Detecting and Counting Overlapping Speakers in Distant Speech Scenarios

Samuele Cornell(Università Politecnica delle Marche), Maurizio Omologo(Fondazione Bruno Kessler - irst), Stefano Squartini(Università Politecnica delle Marche) and Emmanuel Vincent(Inria)

Wed-3-2-10 All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection

Niko Moritz(MERL), Gordon Wichern(Mitsubishi Electric Research Laboratories), Takaaki Hori(Mitsubishi Electric Research Laboratories) and Jonathan Le Roux(Mitsubishi Electric Research Laboratories)

Computational Paralinguistics II (CP II)   Video

Wed-3-3-1 Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals

Lorenz Diener(University of Bremen), Shahin Amiriparian(University of Augsburg), Catarina Botelho(INESC-ID/Instituto Superior Técnico, University of Lisbon, Portugal), Kevin Scheck(University of Bremen), Dennis Küster(University of Bremen), Isabel Trancoso(INESC-ID / IST Univ. Lisbon), Björn Schuller(University of Augsburg / Imperial College London) and Tanja Schultz(Universität Bremen)

Wed-3-3-2 Predicting Collaborative Task Performance using Graph Interlocutor Acoustic Network in Small Group Interaction

Shun-Chang Zhong(Department of Electrical Engineering, National Tsing Hua University), Bo-Hao Su(Department of Electrical Engineering, National Tsing Hua University), Wei Huang(Gamania Digital Entertainment Co ., Ltd. (HQ)), Yi-Ching Liu(College of Management, National Taiwan University) and Chi-Chun Lee(Department of Electrical Engineering, National Tsing Hua University)

Wed-3-3-3 Very Short-term Conflict Intensity Estimation Using Fisher Vectors

Gábor Gosztolya(Research Group on Artificial Intelligence)

Wed-3-3-4 Gaming Corpus for Studying Social Screams

Hiroki Mori(Utsunomiya University) and Yuki Kikuchi(Utsunomiya University)

Wed-3-3-5 Speaker discrimination in humans and machines: Effects of speaking style variability

Amber Afshan(University of California, Los Angeles), Jody Kreiman(University of California, Los Angeles) and Abeer Alwan(UCLA)

Wed-3-3-7 Towards a comprehensive assessment of speech intelligibility for pathological speech

Wei Xue(Center for Language and Speech Technology (CLST), Center for Language Studies (CLS), Radboud University), Viviana Mendoza Ramos(Department of Otorhinolaryngology, Head and Neck Surgery and Communication Disorders, University Hospital of Antwerp), Wieke Harmsen(Centre for Language Studies (CLS), Radboud University), Catia Cucchiarini(Radboud University Nijmegen), Roeland van Hout(Radboud University Nijmegen) and Helmer Strik(Centre for Language and Speech Technology (CLST), Centre for Language Studies (CLS), Radboud University Nijmegen)

Wed-3-3-8 Effects of communication channels and actor’s gender on emotion identification by native Mandarin speakers

Yi Lin(Shanghai Jiao Tong University) and Hongwei Ding(Shanghai Jiao Tong University)

Wed-3-3-9 Detection of voicing and place of articulation of fricatives with deep learning in a virtual speech and language therapy tutor

Ivo Anjos(Universidade Nova de Lisboa), Maxine Eskenazi(Carnegie Mellon University), Nuno Marques(Universidade Nova de Lisboa), Margarida Grilo(Escola Superior de Saúde do Alcoitão), Isabel Guimarães(Escola Superior de Saúde do Alcoitão), João Magalhães(Universidade Nova de Lisboa) and Sofia Cavaco(Universidade Nova de Lisboa)

Speech Synthesis Paradigms and Methods II   Video

Wed-3-4-1 Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages

Haitong Zhang(NetEase Games AI Lab) and Yue Lin(NetEase Games AI Lab)

Wed-3-4-2 Conditional Spoken Digit Generation with StyleGAN

Kasperi Palkama(Aalto University), Lauri Juvela(Aalto University) and Alexander Ilin(Aalto University)

Wed-3-4-3 Towards Universal Text-to-Speech

Jingzhou Yang(Microsoft) and Lei He(Microsoft)

Wed-3-4-4 Speaker-Independent Mel-cepstrum Estimation from Articulator Movements Using D-vector Input

Kouichi Katsurada(Tokyo University of Science) and Korin Richmond(University of Edinburgh)

Wed-3-4-5 Enhancing Monotonicity for Robust Autoregressive Transformer TTS

Xiangyu Liang(Tsinghua University), Zhiyong Wu(The Chinese University of Hong Kong), Runnan Li(Tsinghua University (THU)), Yanqing Liu(Search Technology Center Asia (STCA), Microsoft), Sheng Zhao(Search Technology Center Asia (STCA), Microsoft) and Helen Meng(The Chinese University of Hong Kong)

Wed-3-4-6 Incremental Text to Speech for Neural Sequence-to-Sequence Models using Reinforcement Learning

Devang Savita Ram Mohan(Papercup Technologies), Raphael Lenain(Novoic), Lorenzo Foglianti(Papercup Technologies), Tian Huey Teh(Papercup Technologies), Marlene Staib(Papercup), Alexandra Torresquintero(Papercup Technologies) and Jiameng Gao(Papercup Technologies)

Wed-3-4-7 Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

Tao Tu(National Taiwan University), YUAN-JUI CHEN(National Taiwan University), Alexander Liu(National Taiwan University) and Hung-yi Lee(National Taiwan University (NTU))

Wed-3-4-8 Learning Joint Articulatory-Acoustic Representations with Normalizing Flows

Pramit Saha(University of British Columbia, Vancouver) and Sidney Fels(University of British Columbia)

Wed-3-4-9 Investigating Effective Additional Contextual Factors in DNN-based Spontaneous Speech Synthesis

Yuki Yamashita(The University of Tokyo), Tomoki Koriyama(The University of Tokyo), Yuki Saito(The University of Tokyo), Shinnosuke Takamichi(University of Tokyo), Yusuke Ijima(NTT corporation), Ryo Masumura(NTT Corporation) and Hiroshi Saruwatari(The University of Tokyo)

Wed-3-4-10 Hider-Finder-Combiner: An Adversarial Architecture For General Speech Signal Modification

Jacob Webber(The Centre for Speech Technology Research, University of Edinburgh), Olivier Perrotin(Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab) and Simon King(University of Edinburgh)

Speaker Embedding   Video

Wed-3-5-1 Wav2Spk: A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms

Weiwei Lin(The Hong Kong Polytechnic University) and Man Wai Mak(The Hong Kong Polytechnic University)

Wed-3-5-2 How Does Label Noise Affect the Quality of Speaker Embeddings?

Minh Pham(Worcester Polytechnic Institute), Zeqian Li(Worcester Polytechnic Institute) and Jacob Whitehill(Worcester Polytechnic Institute)

Wed-3-5-3 A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

Xuechen Liu(Inria Nancy Grand Est), Md Sahidullah(University of Eastern Finland) and Tomi Kinnunen(University of Eastern Finland)

Wed-3-5-4 Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations

Wei Xia(University of Texas at Dallas) and John H.L. Hansen(Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems)

Wed-3-5-5 Intra-class variation reduction of speaker representation in disentanglement framework

Yoohwan Kwon(Yonsei University), Soo-Whan Chung(Yonsei University) and Hong-Goo Kang(Yonsei University)

Wed-3-5-6 Compact Speaker Embedding: lrx-vector

Munir Georges(Intel Germany), Jonathan Huang(Intel Corp.) and Tobias Bocklet(Technische Hochschule Nürnberg)

Wed-3-5-7 Cosine-Distance Virtual Adversarial Training for Semi-Supervised Speaker-Discriminative Acoustic Embeddings

Florian Kreyssig(University of Cambridge) and Phil Woodland(Cambridge University)

Wed-3-5-8 Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification

Junyi Peng(Peking University Shenzhen Graduate School), Rongzhi Gu(Peking University Shenzhen Graduate School) and Yuexian Zou(Peking University Shenzhen Graduate School)

Wed-3-5-9 Neural Discriminant Analysis for Deep Speaker Embedding

Lantian Li(Tsinghua University), Dong Wang(Tsinghua University) and Thomas Fang Zheng(CSLT, Tsinghua University)

Wed-3-5-10 Learning Speaker Embedding from Text-to-Speech

Jaejin Cho(Johns Hopkins University), Piotr Zelasko(Johns Hopkins University), Jesus Villalba(Johns Hopkins University), Shinji Watanabe(Johns Hopkins University) and Najim Dehak(Johns Hopkins University)

Single-Channel Speech Enhancement III   Video

Wed-3-7-1 Noisy-reverberant Speech Enhancement Using DenseUNet with Time-frequency Attention

Yan Zhao(The Ohio State University) and DeLiang Wang(Ohio State University)

Wed-3-7-2 On Loss Functions and Recurrency Training for GAN-based Speech Enhancement Systems

Zhuohuang Zhang(Indiana University Bloomington), Chengyun Deng(Didi Chuxing), Yi Shen(Indiana University Bloomington), Donald S. Williamson(Indiana University Bloomington), Yongtao Sha(Didi Chuxing), Yi Zhang(Didi Chuxing), Hui Song(Didi Chuxing) and Xiangang Li(Didi Chuxing)

Wed-3-7-3 Self-supervised Adversarial Multi-task Learning for Vocoder-based Monaural Speech Enhancement

Zhihao Du(Harbin Institute of Technology), Ming Lei(Machine Intelligence Technology, Alibaba Group), Jiqing Han(Harbin Institute of Technology) and Shiliang Zhang(Machine Intelligence Technology, Alibaba Group)

Wed-3-7-4 Deep Speech Inpainting of Time-frequency Masks

Mikolaj Kegler(Imperial College London, SW7 2AZ, London), Pierre Beckmann(Swiss Federal Institute of Technology Lausanne) and Milos Cernak(Logitech Europe)

Wed-3-7-5 Real-time single-channel deep neural network-based speech enhancement on edge devices

Nikhil Shankar(The University of Texas at Dallas), Gautam Shreedhar Bhat(The University of Texas at Dallas) and Issa Panahi(The University of Texas at Dallas)

Wed-3-7-6 Improved Speech Enhancement using a Time-Domain GAN with Mask Learning

Ju Lin(Clemson University), Sufeng Niu(Linkedin), Adriaan J. van Wijngaarden(Nokia Bell Labs), Jerome L. McClendon(Clemson University), Melissa C. Smith(Clemson University) and Kuang-Ching Wang(Clemson University)

Wed-3-7-7 Real Time Speech Enhancement in the Waveform Domain

Alexandre Defossez(Facebook AI Research, INRIA, PSL Research University), Gabriel Synnaeve(Facebook AI Research) and Yossi Adi(Facebook AI Research)

Wed-3-7-8 Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Michał Romaniuk(Samsung R&D Institute Poland), Piotr Masztalski(Samsung R&D Institute Poland), Karol Piaskowski(Samsung R&D Institute Poland) and Mateusz Matuszewski(Samsung R&D Institute Poland)

Multi-Channel Audio and Emotion Recognition   Video

Wed-3-8-1 Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition

Yuya Chiba(Tohoku University), Takashi Nose(Tohoku University) and Akinori Ito(Tohoku University)

Wed-3-8-2 Microphone Array Post-filter for Target Speech Enhancement Without a Prior Information of Point Interferers

Guanjun Li(National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences), Shan Liang(NLPR, Institute of Automation, Chinese Academy of Sciences), Shuai Nie(NLPR, Institute of Automation, Chinese Academy of Sciences), Wenju Liu(NLPR, Institute of Automation, Chinese Academy of Sciences), Zhanlei Yang(Huawei Technologies) and Longshuai Xiao(Huawei Technologies)

Wed-3-8-4 The Method of Random Directions Optimization for Stereo Audio Source Separation

Oleg Golokolenko(Ilmenau University of Technology) and Gerald Schuller(Ilmenau University of Technology)

Wed-3-8-5 Gated Recurrent Fusion of Spatial and Spectral Features for Multi-channel Speech Separation with Deep Embedding Representations

Cunhang Fan(Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Bin Liu(Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(Institute of Automation, Chinese Academy of Sciences) and Zhengqi Wen(Institute of Automation, Chinese Academy of Sciences)

Wed-3-8-7 A Lightweight Model Based on Separable Convolution for Speech Emotion Recognition

Ying Zhong(Xinjiang University), Ying Hu(Xinjiang University), Huang Hao(Xinjiang University) and Wushour Silamu(Xinjiang University)

Wed-3-8-8 Meta Multi-task Learning for Speech Emotion Recognition

Ruichu Cai(GDUT), Kaibin Guo(Guangdong University of Technology), Boyan Xu(Faculty of Computer, Guangdong University of Technology), Xiaoyan Yang(YITU) and Zhenjie Zhang(Yitu Technology)

Wed-3-8-9 GEV Beamforming Supported by DOA-based Masks Generated on Pairs of Microphones

Francois Grondin(Universite de Sherbrooke), Jean-Samuel Lauzon(Universite de Sherbrooke), Jonathan Vincent(Universite de Sherbrooke) and Francois Michaud(Universite de Sherbrooke)

Computational Resource Constrained Speech Recognition   Video

Wed-3-9-1 Accurate Detection of Wake Word Start and End Using a CNN

Christin Jose(Amazon), Yuriy Mishchenko(Amazon.com), Thibaud Senechal(Amazon), Anish Shah(Amazon), Alex Escott(Amazon) and Shiv Vitaladevuni(Amazon)

Wed-3-9-2 Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering

Saurabh Adya(Apple), Vineet Garg(Apple), Siddharth Sigtia(Apple), Pramod Simha(Apple) and Chandra Dhir(Apple)

Wed-3-9-4 Iterative Compression of End-to-End ASR Model using AutoML

Abhinav Mehrotra(Samsung AI Center), Lukasz Dudziak(Samsung AI Center), Jinsu Yeo(Samsung), Young-yoon Lee(Samsung), Ravichander Vipperla(Samsung AI Centre), Mohamed Abdelfattah(Samsung AI Center), Sourav Bhattacharya(Samsung AI Center), Samin Ishtiaq(Samsung AI Center), Alberto Gil C. P. Ramos(Samsung AI Center), SangJeong Lee(Samsung), Daehyun Daehyun Kim(Samsung) and Nic Lane(Samsung AI Center)

Wed-3-9-5 Quantization Aware Training with Absolute-Cosine Regularization for Automatic Speech Recognition

Hieu Nguyen(Amazon.com), Anastasios Alexandridis(Amazon.com) and Athanasios Mouchtaris(Amazon.com)

Wed-3-9-6 Streaming on-device end-to-end ASR system for privacy-sensitive voicetyping

Abhinav Garg(Samsung Electronics), Gowtham Vadisetti(Samsung Electronics), Sichen Jin(Samsung Electronics), Dhananjaya Gowda(Samsung Research), Aditya Jayasimha(Samsung Electronics), Youngho Han(Samsung Electronics), Jiyeon Kim(Samsung Electronics), Junmo Park(Samsung Electronics), Kwangyoun Kim(Samsung Research), Sooyeon Kim(Samsung Electronics), Youngyoon Lee(Samsung Electronics), Kyungbo Min(Samsung Electronics) and Chanwoo Kim(Samsung Research)

Wed-3-9-7 Scaling Up Online Speech Recognition Using ConvNets

Vineel Pratap(Facebook), Qiantong Xu(Facebook AI Research), Jacob Kahn(Facebook AI), Gilad Avidov(Facebook AI), Tatiana Likhomanenko(Facebook AI Research), Awni Hannun(Facebook AI Research), Vitaliy Liptchinsky(Facebook AI), Gabriel Synnaeve(Facebook AI Research) and Ronan Collobert(Facebook AI Research)

Wed-3-9-8 Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Ye Bai(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengkun Tian(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Shuai Zhang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)

Wed-3-9-9 Rescore in a Flash: Compact, Cache Efficient Hashing Data Structures for N-gram Language Models

Grant Strimel(Amazon.com), Ariya Rastrow(Amazon.com), Gautam Tiwari(Amazon.com), Adrien Pierard(Amazon.com) and Jon Webb(Amazon.com)

Speech Synthesis: Prosody and Emotion   Video

Wed-3-10-1 Multi-speaker Emotion Conversion via Latent Variable Regularization and A Chained Encoder-Decoder-Predictor Network

Ravi Shankar(Johns Hopkins University), Hsi-Wei Hsieh(Johns Hopkins University), Nicolas Charon(Johns Hopkins University) and Archana Venkataraman(Johns Hopkins University)

Wed-3-10-2 Non-parallel Emotion Conversion using a Deep-Generative Hybrid Network and an Adversarial Pair Discriminator

Ravi Shankar(Johns Hopkins University), Jacob Sager(Johns Hopkins University) and Archana Venkataraman(Johns Hopkins University)

Wed-3-10-3 Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning

Noé Tits(University of Mons), Kevin El Haddad(University of Mons) and Thierry Dutoit(University of Mons)

Wed-3-10-4 Nonparallel Emotional Speech Conversion Using VAE-GAN

Yuexin Cao(Ping An Technology), Zheng-Chen Liu(University of Science and Technology of China), Minchuan Chen(Ping An Technology), Jun Ma(Ping An Technology), Shaojun Wang(Ping An Technology) and Jing Xiao(Ping An Technology)

Wed-3-10-5 Principal Style Components: Expressive Style Control and Cross-Speaker Transfer in Neural TTS

Alexander Sorin(IBM Research - Haifa), Slava Shechtman(IBM Research - Haifa) and Ron Hoory(IBM Research - Haifa)

Wed-3-10-6 Converting Anyone’s Emotion: Towards Speaker-Independent Emotional Voice Conversion

Kun Zhou(Dept of Electrical and Computer Engineering, National University of Singapore), Berrak Sisman(National University of Singapore), Mingyang Zhang(National University of Singapore) and Haizhou Li(National University of Singapore)

Wed-3-10-7 Controlling the Strength of Emotions in Speech-like Emotional Sound Generated by WaveNet

Kento Matsumoto(Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University), Sunao Hara(Okayama University) and Masanobu Abe(Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University)

Wed-3-10-8 Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation

Guangyan Zhang(Dept. of Electronic Engineering, The Chinese University of Hong Kong), Ying Qin(The Chinese University of Hong Kong) and Tan Lee(The Chinese University of Hong Kong)

Wed-3-10-9 Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM

Takuya Kishida(The University of Electro-Communications), Shin Tsukamoto(The University of Electro-Communications) and Toru Nakashika(The University of Electro-Communications)

Wed-3-10-10 Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis

Fengyu Yang(Northwestern Polytechnical University), Shan Yang(Northwestern Polytechnical University), Qinghua Wu(XiaoMi), Yujun Wang(XiaoMi) and Lei Xie(Northwestern Polytechnical University)

Wed-3-10-11 Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis

Yukiya Hono(Nagoya Institute of Technology), Kazuna Tsuboi(Microsoft Development Co., Ltd.), Kei Sawada(Microsoft Development Co., Ltd.), Kei Hashimoto(Nagoya Institute of Technology), Keiichiro Oura(Nagoya Institute of Technology), Yoshihiko Nankaku(Nagoya Institute of Technology) and Keiichi Tokuda(Nagoya Institute of Technology)

Wed-3-10-12 GAN-based Data Generation for Speech Emotion Recognition

Sefik Emre Eskimez(Microsoft), Dimitrios Dimitriadis(Microsoft), Robert Gmyr(Microsoft) and Kenichi Kumatani(Amazon Inc.)

Wed-3-10-13 The phonetic bases of vocal expressed emotion: natural versus acted

Hira Dhamyal(Carnegie Mellon University), Shahan Ali Memon(Language Technologies Institute, Carnegie Mellon University), Bhiksha Raj(Carnegie Mellon University) and Rita Singh(Carnegie Mellon University)

The Interspeech 2020 Far Field Speaker Verification Challenge (F2SVC)   Video

Wed-SS-3-11-1 The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

Xiaoyi Qin(Duke Kunshan University,Kunshan), Ming Li(Duke Kunshan University), hui bu(AI Shell Foundation), Wei RAO(National University of Singapore), Rohan Kumar Das(National University Singapore), Shrikanth Narayanan(University of Southern California) and Haizhou Li(National University of Singapore)

Wed-SS-3-11-2 Deep Embedding Learning for Text-Dependent Speaker Verification

Peng Zhang(Inner Mongolia University), Peng Hu(Elevoc Technology Co.,Ltd) and XueLiang Zhang(Inner Mongolia University)

Wed-SS-3-11-3 STC-innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020

Aleksei Gusev(STC-innovations/ITMO), Vladimir Volokhov(STC-innovations), Alisa Vinogradova(STC-innovations/ITMO), Tseren Andzhukaev(STC-innovations), Andrey Shulipa(ITMO), Sergey Novoselov(STC-innovations/ITMO), Timur Pekhovsky(STC-innovations) and Alexander Kozlov(STC-innovations)

Wed-SS-3-11-4 NPU Speaker Verification System for INTERSPEECH 2020 Far-Field Speaker Verification Challenge

Li Zhang(Northwestern Polytechnical University), Jian Wu(Northwestern Polytechnical University) and Lei Xie(Northwestern Polytechnical University)

Wed-SS-3-11-5 The JD AI Speaker Verification System for the FFSVC 2020 Challenge

Ying Tong(JD AI Research), Wei Xue(JD AI Research), Shanluo Huang(JD AI Research), Lu Fan(JD AI Research), Chao Zhang(JD AI Research), Guohong Ding(JD AI Research) and Xiaodong He(JD AI Research)

Multimodal Speech Processing   Video

Wed-3-12-1 FaceFilter: Audio-visual speech separation using still images

Soo-Whan Chung(Yonsei University), Soyeon Choe(Naver Corporation), Joon Son Chung(University of Oxford) and Hong-Goo Kang(Yonsei University)

Wed-3-12-2 Seeing voices and hearing voices: Learning discriminative embeddings using cross-modal self-supervision

Soo-Whan Chung(Yonsei University), Hong-Goo Kang(Yonsei University) and Joon Son Chung(University of Oxford)

Wed-3-12-3 Fusion Architectures for Word-based Audiovisual Speech Recognition

Michael Wand(The Swiss AI Lab IDSIA) and Juergen Schmidhuber(The Swiss AI Lab IDSIA)

Wed-3-12-4 Audio-visual Multi-channel Recognition of Overlapped Speech

Jianwei Yu(the Chinese University of Hong Kong), Bo Wu(Tencent AI Lab), Rongzhi Gu(Peking University Shenzhen Graduate School), Shi-Xiong ZHANG(Tencent AI Lab), lianwu chen(tencent), YONG XU(Tencent AI lab), Meng Yu(Tencent AI Lab), Dan Su(Tencent AILab Shenzhen), Dong Yu(Tencent AI Lab), Xunying Liu(Chinese University of Hong Kong) and Helen Meng(The Chinese University of Hong Kong)

Wed-3-12-5 TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog

Wubo Li(Didi Chuxing), Dongwei Jiang(Didi Chuxing), Wei Zou(Didi Chuxing) and Xiangang Li(Didi Chuxing)

Wed-3-12-6 Should we hard-code the recurrence concept or learn it instead ? Exploring the Transformer architecture for Audio-Visual Speech Recognition

George Sterpu(Trinity College Dublin, Ireland), Christian Saam(ADAPT Centre, Trinity College Dublin) and Naomi Harte(Trinity College Dublin)

Wed-3-12-7 Resource-adaptive deep learning for visual speech recognition

Alexandros Koumparoulis(University of Thessaly), Gerasimos Potamianos(University of Thessaly), Samuel Thomas(IBM Research AI) and Edmilson Morais(IBM Research)

Wed-3-12-9 Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion

Hong Liu(Shenzhen Graduate School, Peking University), Zhan Chen(Peking University) and Bing Yang(Shenzhen Graduate School, Peking University)

Wed-3-12-10 CAPTION ALIGNMENT FOR LOW RESOURCE AUDIO-VISUAL DATA

Vighnesh Reddy Konda(Indian Institute of Technology Bombay), Mayur Warialani(iitb.ac.in), Rakesh Prasanth Achari(iitb.ac.in), Varad Bhatnagar(iitb.ac.in), Jayaprakash Akula(iitb.ac.in), Preethi Jyothi(Indian Institute of Technology Bombay), Ganesh Ramakrishnan(Department of Computer Science and Engineering, Indian Institute of Technology Bombay), Gholamreza Haffari(Monash University) and Pankaj Singh(IIT Bombay)

Thursday 18:00-19:00(GMT+8), October 29

Keynote 4  

Thursday 19:15-20:15(GMT+8), October 29

Speech Synthesis: Neural Waveform Generation II   Video

Thu-1-1-1 Vocoder-Based Speech Synthesis from Silent Videos

Daniel Michelsanti(Aalborg University), Olga Slizovskaia(Universitat Pompeu Fabra), Gloria Haro(Universitat Pompeu Fabra), Emilia Gómez(Universitat Pompeu Fabra), Zheng-Hua Tan(Aalborg University) and Jesper Jensen(Aalborg University)

Thu-1-1-2 Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation

Yi-Chiao Wu(Nagoya University), Tomoki Hayashi(Nagoya University), Takuma Okamoto(National Institute of Information and Communications Technology), Hisashi Kawai(National Institute of Information and Communications Technology) and Tomoki Toda(Nagoya University)

Thu-1-1-3 A Cyclical Post-filtering Approach to Mismatch Refinement of Neural Vocoder for Text-to-speech Systems

Yi-Chiao Wu(Nagoya University), Patrick Lumban Tobing(Nagoya University), Kazuki Yasuhara(Nagoya University), Noriyuki Matsunaga(AI Inc.), Yamato Ohtani(AI Inc.) and Tomoki Toda(Nagoya University)

Thu-1-1-4 Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder

Hyun-Wook Yoon(Korea University), Sang-Hoon Lee(Korea University), Hyeong-Rae Noh(Korea University) and Seong-Whan Lee(Korea University)

Thu-1-1-5 StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes

Manish Sharma(Google), Tom Kenter(Google UK) and Robert Clark(Google, UK)

Thu-1-1-6 An Efficient Subband Linear Prediction for LPCNet-based Neural Synthesis

Yang Cui(Microsoft AI & Research), Xi Wang(Microsoft), Lei He(Microsoft) and Frank Soong(Microsoft Research Asia)

Thu-1-1-7 Reverberation Modeling for Source-Filter-based Neural Vocoder

Yang Ai(University of Science and Technology of China), Xin Wang(National Institute of Informatics, Sokendai University), Junichi Yamagishi(National Institute of Informatics) and Zhenhua Ling(University of Science and Technology of China)

Thu-1-1-8 Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

Ravichander Vipperla(Samsung AI Centre), Sangjun Park(Samsung Research), Kihyun Choo(Samsung Research), Samin Ishtiaq(Samsung AI Center), Kyoungbo Min(Samsung Research), Sourav Bhattacharya(Samsung AI Center), Abhinav Mehrotra(Samsung AI Center), Alberto Gil Couto Pimentel Ramos(Samsung AI Center) and Nicholas D. Lane(Samsung AI Center)

Thu-1-1-9 Neural Text-to-Speech with a Modeling-by-Generation Excitation Vocoder

Eunwoo Song(NAVER Corp.), Min-Jae Hwang(Search Solutions Inc.), Ryuichi Yamamoto(LINE Corp.), Jin-Seob Kim(Naver Corp.), Ohsung Kwon(Naver Corp.) and Jae-Min Kim(Naver Corp.)

Thu-1-1-10 SpeedySpeech: Efficient Neural Speech Synthesis

Jan Vainer(Charles University) and Ondřej Dušek(Charles University)

ASR Neural Network Architectures and Training II   Video

Thu-1-2-1 Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution

Ziqiang Zhang(University of Science and Technology of China), Yan Song(university of science and technology of china), Jianshu Zhang(University of Science and Technology of China), Ian McLoughlin(University of Science and Technology of China) and Lirong Dai(University of Science &Technology of China)

Thu-1-2-3 Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

Jinyu Li(Microsoft), rui zhao(microsoft), Zhong Meng(Microsoft Corporation), Yanqing Liu(Microsoft), Wenning Wei(Microsoft), Sarangarajan Parthasarathy(Microsoft), Vadim Mazalov(Microsoft), Zhenghao Wang(Microsoft), Lei He(Microsoft), sheng zhao(microsoft) and Yifan Gong(Microsoft Corp)

Thu-1-2-4 End-to-End ASR with Adaptive Span Self-Attention

Xuankai Chang(Johns Hopkins University), Aswin Shanmugam Subramanian(Johns Hopkins University), Pengcheng Guo(Northwestern Polytechnical University), Shinji Watanabe(Johns Hopkins University), Yuya Fujita(Yahoo Japan Corporation) and Motoi Omachi(Yahoo Japan Corporation)

Thu-1-2-5 Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition

Egor Lakomkin(Amazon Inc.), Jahn Heymann(Amazon Inc.), Ilya Sklyar(Amazon Inc.) and Simon Wiesler(Amazon Inc.)

Thu-1-2-6 Early Stage LM Integration Using Local and Global Log-Linear Combination

Wilfried Michel(RWTH Aachen University), Ralf Schlüter(RWTH Aachen University) and Hermann Ney(RWTH Aachen University)

Thu-1-2-7 ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Wei Han(Google), Zhengdong Zhang(Google Brain), Yu Zhang(Google Brain), Jiahui Yu(Google), Chung-Cheng Chiu(Google), James Qin(Google), Anmol Gulati(Google Brain), Ruoming Pang(Google Inc.) and Yonghui Wu(Google Brain)

Thu-1-2-8 Emitting Word Timings with End-to-End Models

Tara Sainath(Google), Ruoming Pang(Google Inc.), David Rybach(Google), Basi Garcia(Google Inc.) and Trevor Strohman(Google, Inc.)

Thu-1-2-9 Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

Danni Liu(Maastricht University), Gerasimos Spanakis(Maastricht University) and Jan Niehues(Maastricht University)

Neural Networks for Language Modeling   Video

Thu-1-3-1 Neural Language Modeling With Implicit Cache Pointers

Ke Li(Johns Hopkins University), Daniel Povey(Xiaomi Corp.) and Sanjeev Khudanpur(Johns Hopkins University)

Thu-1-3-2 Finnish ASR with Deep Transformer Models

Abhilash Jain(Aalto University), Aku Rouhe(Aalto University), Stig-Arne Grönroos(Aalto University, Department of Signal Processing and Acoustics) and Mikko Kurimo(Aalto University)

Thu-1-3-3 Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

Hayato Futami(Kyoto University), Hirofumi Inaguma(Kyoto University), Sei Ueno(Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto, Japan), Masato Mimura(Kyoto University), Shinsuke Sakai(Kyoto University) and Tatsuya Kawahara(Kyoto University)

Thu-1-3-4 Stochastic Convolutional Recurrent Networks for Language Modeling

Jen-Tzung Chien(National Chiao Tung University) and Yu-Min Huang(National Chiao Tung Univeristy)

Thu-1-3-5 Investigation of Large-Margin Softmax in Neural Language Modeling

Jingjing Huo(RWTH Aachen University), Yingbo Gao(RWTH Aachen University), Weiyue Wang(RWTH Aachen University), Ralf Schlüter(Lehrstuhl Informatik 6, RWTH Aachen University) and Hermann Ney(RWTH Aachen University)

Thu-1-3-6 Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model

Da-Rong Liu(National Taiwan University), Chunxi Liu(Facebook), Frank Zhang(Facebook), Gabriel Synnaeve(Facebook), Yatharth Saraf(Facebook) and Geoffrey Zweig(Facebook)

Thu-1-3-7 Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

Yosuke Higuchi(Waseda University), Shinji Watanabe(Johns Hopkins University), Nanxin Chen(Johns Hopkins University), Tetsuji Ogawa(Waseda University) and Tetsunori Kobayashi(Waseda University)

Thu-1-3-8 Insertion Based Modelling for End-to-End Automatic Speech Recognition

Yuya Fujita(Yahoo Japan Corporation), Shinji Watanabe(Johns Hopkins University), Motoi Omachi(Yahoo Japan Corporation) and Xuankai Chang(Johns Hopkins University)

Phonetic Event Detection and Segmentation   Video

Thu-1-4-1 Voice activity detection in the wild via weakly supervised sound event detection

Yefei Chen(Shanghai Jiao Tong University), Heinrich Dinkel(Shanghai Jiao Tong University), Mengyue Wu(Shanghai Jiao Tong University) and Kai Yu(Shanghai Jiao Tong University)

Thu-1-4-2 Dual Attention in Time and Frequency Domain for Voice Activity Detection

Joohyung Lee(KAIST), Youngmoon Jung(KAIST) and Hoi Rin Kim(KAIST)

Thu-1-4-3 Polishing the Classical Likelihood Ratio Test by Supervised Learning for Voice Activity Detection

Tianjiao Xu(Inner Mongolia University), Hui Zhang(Inner Mongolia University) and Xueliang Zhang(Computer Science Department, Inner Mongolian University)

Thu-1-4-4 A Noise Robust Technique for Detecting Vowels in Speech Signals

Avinash Kumar(National Institute of Technology patna), Syed Shahnawazuddin(National Institute of Technology Patna) and Waquar Ahmad(National Institute of Technology Calicut)

Thu-1-4-5 End-to-end Domain-Adversarial Voice Activity Detection

Marvin Lavechin(Ecole Normale Supérieure - Laboratoire de Sciences Cognitives et Psycholinguistique), Marie-Philippe Gill(Ecole de Technologie Supérieure, Université du Québec, Montreal, Canada), Ruben Bousbib(Cognitive Machine Learning team, Ecole Normale Supérieure/INRIA, PSL, Paris, France), Hervé Bredin(CNRS LIMSI) and Leibny Paola Garcia Perera(Johns Hopkins University)

Thu-1-4-6 VOP Detection in Variable Speech Rate Condition

Ayush Agarwal(IIT Dharwad), Jagabandhu Mishra(Indian Institute of Technology Dharwad) and S. R. Mahadeva Prasanna(Indian Institute of Technology Dharwad)

Thu-1-4-7 MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

Zhenpeng Zheng(PingAn Technology (Shenzhen) Co., Ltd), Jianzong Wang(Ping An Technology (Shenzhen) Co., Ltd.), Ning Cheng(Ping An Technology (Shenzhen) Co., Ltd.), Jian Luo(Ping An Technology (Shenzhen) Co., Ltd.) and Jing Xiao(Ping An Technology (Shenzhen) Co., Ltd.)

Thu-1-4-8 Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

Felix Kreuk(Bar-Ilan University), Joseph Keshet(Bar-Ilan University) and Yossi Adi(Facebook AI Research)

Thu-1-4-9 That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Piotr Żelasko(Johns Hopkins University), Laureano Moro Velazquez(Johns Hopkins University), Mark Hasegawa-Johnson(University of Illinois), Odette Scharenborg(Multimedia computing, Delft University of Technology) and Najim Dehak(Johns Hopkins University)

Thu-1-4-10 Analyzing read aloud speech by primary school pupils: insights for research and development.

Sanne Limonard(Centre for Language and Speech Technology (CLST), Radboud University Nijmegen), Catia Cucchiarini(Centre for Language and Speech Technology (CLST), Radboud University Nijmegen), Roeland van Hout(Centre for Language Studies (CLS), Radboud University Nijmegen) and Helmer Strik(Centre for Language and Speech Technology (CLST), Centre for Language Studies (CLS), Radboud University Nijmegen)

Human Speech Production II   Video

Thu-1-5-1 Discovering articulatory speech targets from synthesized random babble

Heikki Rasilo(Vrije Universiteit Brussel) and Yannick Jadoul(Vrije Universiteit Brussel)

Thu-1-5-2 Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract

Tamás Gábor Csapó(Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics)

Thu-1-5-3 Acoustic-to-Articulatory Inversion with Deep Autoregressive Articulatory-WaveNet

Narjes Bozorg(University of Kentucky) and Michael Johnson(University of Kentucky)

Thu-1-5-4 Using Silence MR Image to Synthesise Dynamic MRI Vocal Tract Data of CV

Ioannis Douros(Université de Lorraine, CNRS, Inria, LORIA, Inserm, IADI, F-54000 Nancy, France), Ajinkya Kulkarni(Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France), Crysanthi Dourou(School of ECE, National Technical University of Athens, Athens 15773, Greece), Yu Xie(Department of Neurology, Zhongnan Hospital of Wuhan University, Wuhan 430071), Jacques Felblinger(Université de Lorraine, INSERM 1433, CIC-IT, CHRU de Nancy, F-54000 Nancy, France), Karyna Isaieva(IADI, Université de Lorraine, INSERM U1254), Pierre-André Vuissoz(Université de Lorraine, INSERM U1254, IADI, F-54000 Nancy, France) and Yves Laprie(LORIA/CNRS)

Thu-1-5-5 Quantification of Transducer Misalignment in Ultrasound Tongue Imaging

Tamás Gábor Csapó(Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics) and Kele Xu(National University of Defense Technology)

Thu-1-5-6 Independent and Automatic Evaluation of Speaker-Independent Acoustic-to-Articulatory Reconstruction

Maud Parrot(CoML team, LSCP, ENS Paris), Juliette Millet(LLF, Université de Paris and CoML team, LSCP, ENS Paris) and Ewan Dunbar(Université Paris Diderot)

Thu-1-5-7 CSL-EMG_Array: An Open Access Corpus for EMG-to-Speech Conversion

Lorenz Diener(University of Bremen), Mehrdad Roustay Vishkasougheh(University of Bremen) and Tanja Schultz(Universität Bremen)

Thu-1-5-8 Links between production and perception of glottalisation in individual Australian English speaker/listeners

Joshua Penney(Macquarie University), Felicity Cox(Macquarie University) and Anita Szakay(Macquarie University)

New Trends in Self-Supervised Speech Processing   Video

Thu-SS-1-6-1 Jointly Fine-Tuning "BERT-like'' Self Supervised Models to Improve Multimodal Speech Emotion Recognition

Shamane Siriwardhana(The University of Auckland), Andrew Reis(The University of Auckland), Rivindu Weerasekara(The University of Auckland) and Suranga Nanayakkara(The University of Auckland)

Thu-SS-1-6-2 Vector-Quantized Autoregressive Predictive Coding

Yu-An Chung(Massachusetts Institute of Technology), Hao Tang(Massachusetts Institute of Technology) and James Glass(Massachusetts Institute of Technology)

Thu-SS-1-6-3 Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks

Xingchen Song(Tsinghua University), Guangsen Wang(Salesforce Research Asia), Yiheng Huang(Tencent AI Lab), Zhiyong Wu(Tsinghua University), Dan Su(Tencent AILab Shenzhen) and Helen Meng(The Chinese University of Hong Kong)

Thu-SS-1-6-4 Large scale weakly and semi-supervised learning for low-resource video ASR

Kritika Singh(Facebook), Vimal Manohar(Facebook), Alex Xiao(Facebook), Sergey Edunov(Facebook), Ross Girshick(Facebook), Vitaliy Liptchinsky(Facebook), Christian Fuegen(Facebook), Yatharth Saraf(Facebook), Geoffrey Zweig(Facebook) and Abdelrahman Mohamed(Facebook)

Thu-SS-1-6-5 Sequence-level self-learning with multiple hypotheses

Kenichi Kumatani(Microsoft), Dimitrios Dimitriadis(Microsoft), Robert Gmyr(Microsoft), Yashesh Gaur(Microsoft.com), Sefik Emre Eskimez(Microsoft), Jinyu Li(Microsoft) and Michael Zeng(Microsoft)

Thu-SS-1-6-6 Defense for black-box attacks on anti-spoofing models by self-supervised learning

Haibin Wu(National Taiwan University), Andy T. Liu(College of Electrical Engineering and Computer Science, National Taiwan University) and Hung-yi Lee(National Taiwan University (NTU))

Thu-SS-1-6-7 Understanding Self-Attention of Self-Supervised Audio Transformers

Shu-wen Yang(National Taiwan University), Andy T. Liu(College of Electrical Engineering and Computer Science, National Taiwan University) and Hung-yi Lee(National Taiwan University (NTU))

Thu-SS-1-6-8 A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

Sameer Khurana(MIT), Antoine Laurent(Le Mans), Wei-Ning Hsu(Massachusetts Institute of Technology), Jan Chorowski(University of Wroclaw), Adrian Lancucki(University of Wroclaw), Ricard Marxer(Université de Toulon, LIS CNRS UMR 7020) and James Glass(Massachusetts Institute of Technology)

Thu-SS-1-6-9 Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings covering Aging and Cognitive Decline

Ayimunishagu Abulimiti(University of Bremen), Jochen Weiner(University of Bremen) and Tanja Schultz(Universität Bremen)

Learning Techniques for Speaker Recognition II   Video

Thu-1-7-1 Dynamic Margin Softmax Loss for Speaker Verification

Dao Zhou(Tianjin University), Longbiao Wang(Tianjin University), Kong Aik Lee(Biometrics Research Laboratories, NEC Corporation), Yibo Wu(Tianjin University), Meng Liu(Tianjin University), Jianwu Dang(JAIST) and Jianguo Wei(Tianjin University)

Thu-1-7-2 On Parameter Adaptation in Softmax-based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-based Speaker Recognition

Magdalena Rybicka(AGH University of Science and Technology) and Konrad Kowalczyk(AGH University of Science and Technology)

Thu-1-7-3 Training Speaker Enrollment Models by Network Optimization

Victoria Mingote(University of Zaragoza), Antonio Miguel(ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain), Alfonso Ortega(University of Zaragoza) and Eduardo Lleida Solano(University of Zaragoza)

Thu-1-7-4 Supervised domain adaptation for text-independent speaker verification using limited data

Seyyed Saeed Sarfjoo(Idiap Research Institute), Srikanth Madikeri(Idiap Research Institute), Petr Motlicek(Idiap Research Institute) and Sebastien Marcel

Thu-1-7-5 Angular Margin Centroid Loss for Text-independent Speaker Recognition

Yuheng Wei(School of Computer Science and Technology, Xidian University), Junzhao Du(School of Computer Science and Technology, Xidian University) and Hui Liu(School of Computer Science and Technology, Xidian University)

Thu-1-7-6 Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning

Jiawen Kang(Tsinghua University), Ruiqi Liu(China University of Mining & Technology, Beijing), Lantian Li(Tsinghua University), Dong Wang(Tsinghua University) and Thomas Fang Zheng(CSLT, Tsinghua University)

Thu-1-7-7 ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

Brecht Desplanques(Ghent University - imec, IDLab, Department of Electronics and Information Systems), Jenthe Thienpondt(IDLab, Department of Electronics and Information Systems, Ghent University - imec, Belgium) and Kris Demuynck(Ghent University)

Thu-1-7-8 Length- and Noise-aware Training Techniques for Short-utterance Speaker Recognition

Wenda Chen(Intel Labs), Jonathan Huang(Intel Corp.) and Tobias Bocklet(Technische Hochschule Nürnberg)

Spoken Language Evaluation   Video

Thu-1-8-1 Spoken Language 'Grammatical Error Correction'

Yiting Lu(University of Cambridge), Mark Gales(Cambridge University) and Yu Wang(University of Cambridge)

Thu-1-8-2 Mixtures of Deep Neural Experts for Automated Speech Scoring

Sara Papi(Università di Siena), Edmondo Trentin(DIISM - Univ. of Siena), Roberto Gretter(FBK), Marco Matassoni(Fondazione Bruno Kessler) and Falavigna Daniele(Fondazione Bruno Kessler)

Thu-1-8-3 Targeted Content Feedback in Spoken Language Learning and Assessment

Xinhao Wang(Educational Testing Service), Klaus Zechner(ETS) and Christopher O Hamill(Educational Testing Service)

Thu-1-8-4 Universal Adversarial Attacks on Spoken Language Assessment Systems

Vyas Raina(University of Cambridge), Mark Gales(Cambridge University) and Kate Knill(University of Cambridge)

Thu-1-8-5 Ensemble Approaches for Uncertainty in Spoken Language Assessment

Xixin Wu(University of Cambridge), Kate Knill(University of Cambridge), Mark Gales(Cambridge University) and Andrey Malinin(University of Cambridge)

Thu-1-8-6 Shadowability Annotation with Fine Granularity on L2 Utterances and Its Improvement with Native Listeners' Script-shadowing

Zhenchao Lin(The University of Tokyo), Ryo Takashima(The University of Tokyo), Daisuke Saito(The University of Tokyo), Nobuaki Minematsu(The University of Tokyo) and Noriko Nakanishi(Kobe Gakuin University)

Thu-1-8-7 ASR-based Evaluation and Feedback for Individualized Reading Practice

Yu Bai(Radboud University), Ferdy Hubers(Radboud University), Catia Cucchiarini(Radboud University Nijmegen) and Helmer Strik(Centre for Language and Speech Technology (CLST), Centre for Language Studies (CLS), Radboud University Nijmegen)

Thu-1-8-8 Domain Adversarial Training for Dysarthric Speech Recognition

Dominika Woszczyk(Imperial College London), Stavros Petridis(Imperial College London / Samsung AI Centre) and David Millard(University of Southampton)

Thu-1-8-9 Automatic Estimation of Pathological Voice Quality based on Recurrent Neural Network using Amplitude and Phase Spectrogram

Shunsuke Hidaka(Kyushu University), Yogaku Lee(Kyushu University), Kohei Wakamiya(Kyushu University), Takashi Nakagawa(Kyushu University) and Tokihiko Kaburaki(Kyushu University)

Spoken Dialogue System   Video

Thu-1-9-1 Stochastic Curiosity Exploration for Dialogue Systems

Jen-Tzung Chien(National Chiao Tung University) and Po-Chien Hsu(National Chiao Tung University)

Thu-1-9-2 Conditional Response Augmentation for Dialogue using Knowledge Distillation

Myeongho Jeong(Yonsei University), Seungtaek Choi(Yonsei University), Hojae Han(Yonsei university), Kyungho Kim(Yonsei University) and Seung-won Hwang(Yonsei University)

Thu-1-9-3 Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption

Hongyin Luo(MIT), Shang-Wen Li(Amazon AWS AI) and James Glass(Massachusetts Institute of Technology)

Thu-1-9-4 End-to-End Task-oriented Dialog System through Template Slot Value Generation

Teakgyu Hong(Naver), Oh-Woog Kwon(Electronics and Telecommunications Research Institute) and Young-Kil Kim(Electronics and Telecommunications Research Institute)

Thu-1-9-5 Task-Oriented Dialog Generation with Enhanced Entity Representation

Zhenhao He(South China University of Technology), Jiachun Wang(South China University of Technology) and Jian Chen(South China University of Technology)

Thu-1-9-6 End-to-end speech-to-dialog-act recognition

Trung V. Dang(Kyoto University), Tianyu Zhao(Kyoto University), Sei Ueno(Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto, Japan), Hirofumi Inaguma(Kyoto University) and Tatsuya Kawahara(Kyoto University)

Thu-1-9-8 Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task

Xinnuo Xu(Heriot-Watt University), Yizhe Zhang(Microsoft), Lars Liden(Microsoft Research) and Sungjin Lee(Amazon Alexa AI)

Dereverberation and Echo Cancellation   Video

Thu-1-10-1 A Semi-blind Source Separation Approach for Speech Dereverberation

Ziteng Wang(Alibaba Group), Yueyue Na(Alibaba Group), Zhang Liu(Alibaba Group), Yun Li(Alibaba Group), Biao Tian(Alibaba Group) and Qiang Fu(Alibaba Group)

Thu-1-10-3 SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation using Optimally Smoothed Spectral Mapping

Vinay Kothapally(The University of Texas at Dallas - Center for Robust Speech Systems (CRSS)), Wei Xia(University of Texas at Dallas), Shahram Ghorbani(Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, TX 75080), John H.L. Hansen(Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems), Wei Xue(JD AI Research) and Jing Huang(JD AI Research)

Thu-1-10-4 A Robust and Cascaded Acoustic Echo Cancellation Based on Deep Learning

Chenggang Zhang(Computer Science Department, Inner Mongolian University) and Xueliang Zhang(Computer Science Department, Inner Mongolian University)

Thu-1-10-5 Generative Adversarial Network based Acoustic Echo Cancellation

Yi Zhang(Didi Research America), Chengyun Deng(Didi Chuxing), Shiqian Ma(Didi Chuxing), Yongtao Sha(Didi Chuxing), Hui Song(Didi Chuxing) and Xiangang Li(Didi Chuxing)

Thu-1-10-7 Independent Echo Path Modeling for Stereophonic Acoustic Echo Cancellation

Yi Gao(WeChat Work, WXG, Tencent Corp.), Ian Liu(Engineer), Jimeng Zheng(Tencent Corp.), Cheng Luo(Tencent Corp.) and Bin Li(Tencent Corp.)

Thu-1-10-8 Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet

Hongsheng Chen(Nanjing University), Teng Xiang(Nanjing University), Kai Chen(Nanjing University) and Jing Lu(Nanjing University)

Thu-1-10-9 IMPROVING PARTITION-BLOCK-BASED ACOUSTIC ECHO CANCELER IN UNDER-MODELING SCENARIOS

Wenzhi Fan(Key Laboratory of Modern Acoustics and Institute of Acoustics, Nanjing University, Nanjing) and Jing Lu(Key Laboratory of Modern Acoustics and Institute of Acoustics, Nanjing University, Nanjing)

Thu-1-10-10 Attention Wave-U-Net for Acoustic Echo Cancellation

Jung-Hee Kim(Hanyang University) and Joon-Hyuk Chang(Hanyang University)

Speech Synthesis: Toward End-to-End Synthesis   Video

Thu-1-11-1 From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer by Feedback Constraint

Zexin Cai(Duke University), Chuxiong Zhang(Duke Kunshan University) and Ming Li(Duke Kunshan University)

Thu-1-11-2 Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

Erica Cooper(National Institute of Informatics), Cheng-I Lai(Massachusetts Institute of Technology), Yusuke Yasuda(National Institute of Informatics) and Junichi Yamagishi(National Institute of Informatics)

Thu-1-11-3 Non-autoregressive End-to-End TTS with Coarse-to-Fine Decoding

Tao Wang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Ruibo Fu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)

Thu-1-11-4 Bi-level Speaker Supervision for One-shot Speech Synthesis

Tao Wang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Ruibo Fu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Chunyu Qiang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)

Thu-1-11-5 Naturalness Enhancement with Linguistic Information in End-to-End TTS Using Unsupervised Parallel Encoding

Alex Peiró Lilja(Universitat Pompeu Fabra) and Mireia Farrús(Universitat Pompeu Fabra)

Thu-1-11-6 MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search

Naihan Li(UESTC), Shujie Liu(Microsoft Research Asia, Beijing), Yanqing Liu(microsoft), sheng zhao(microsoft) and Ming Liu(University of Electronic Science and Techonology of China)

Thu-1-11-7 JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

Dan Lim(Kakao Corp.) , Won Jang(Kakao Enterprise Corp.) , Gyeonghwan O(Kakao Enterprise Corp.) , Heayoung Park(Kakao Enterprise Corp.) , Bongwan Kim(Kakao Enterprise Corp.) , Jaesam Yoon(Kakao Enterprise Corp.)

Thu-1-11-8 End-to-end text-to-speech synthesis with unaligned multiple language units based on attention

Masashi Aso(University of Tokyo), Shinnosuke Takamichi(University of Tokyo) and Hiroshi Saruwatari(The University of Tokyo)

Thu-1-11-9 Attention Forcing for Speech Synthesis

Qingyun DOU(University of Cambridge), Joshua Efiong(University of Cambridge) and Mark Gales(University of Cambridge)

Thu-1-11-10 Testing the Limits of Representation Mixing for Pronunciation Correction in End-to-End Speech Synthesis

Jason Fong(University of Edinburgh), Jason Taylor(University of Edinburgh) and Simon King(University of Edinburgh)

Thu-1-11-11 MultiSpeech: Multi-Speaker Text to Speech with Transformer

Mingjian Chen(Peking University), Xu Tan(Microsoft Research Asia), Yi Ren(Zhejiang University), Jin Xu(Tsinghua University), Hao Sun(Peking University), Sheng Zhao(Microsoft STC Asia), Tao QIN(Microsoft Research Asia) and Tie-Yan Liu(Microsoft Research Asia)

Thursday 20:30-21:30(GMT+8), October 29

Speech Enhancement, Bandwidth Extension and Hearing Aids   Video

Thu-2-1-1 Exploiting Conic Affinity Measures to Design Speech Enhancement Systems Operating in Unseen Noise Conditions

Pavlos Papadopoulos(University of Southern California) and Shrikanth Narayanan(University of Southern California)

Thu-2-1-2 Adversarial Dictionary Learning for Monaural Speech Enhancement

Yunyun Ji(Agora IO, Inc), longting xu(Donghua University) and WeiPing Zhu(Concordia University)

Thu-2-1-4 Spatial Covariance Matrix Estimation for Reverberant Speech with Application to Speech Enhancement

Ran Weisman(Ben-Gurion University of the Negev), Vladimir Tourbabin(Facebook Reality Labs), Paul Calamia(Facebook Reality Labs) and Boaz Rafaely(Ben-Gurion University of the Negev)

Thu-2-1-5 A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement

Minh Tri Ho(Department of Electrical and Electronic Engineering, Yonsei University), Jinyoung Lee(Department of Electrical and Electronic Engineering, Yonsei University), Bong-Ki Lee(LG Electronics), Dong Hoon Yi(LG Electronics) and Hong-Goo Kang(Department of Electrical and Electronic Engineering, Yonsei University)

Thu-2-1-6 TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

igor fedorov(arm), Marko Stamenovic(Bose Corp.), Carl Jensen(Bose Corp.), Li-Chia Yang(Bose Corp.), Ari Mandell(Bose Corp.), Yiming Gan(University of Rochester), Matthew Mattina(Arm) and Paul Whatmough(Arm)

Thu-2-1-7 INTELLIGIBILITY ENHANCEMENT BASED ON SPEECH WAVEFORM MODIFICATION USING HEARING IMPAIRMENT SIMULATOR

Shu Hikosaka(Nagoya University), Shogo Seki(Nagoya University), Tomoki Hayashi(Nagoya University), Kazuhiro Kobayashi(Nagoya University), Kazuya TAKEDA(Professor), Hideki Banno(Meijo University) and Tomoki Toda(Nagoya University)

Thu-2-1-8 Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network

Nana Hou(Nanyang Technological University), Chenglin Xu(Nanyang Technological University), Van Tung Pham(Nanyang Technological University), Joey Tianyi Zhou(Institute of High Performance Computing (IHPC), A*STAR), Eng Siong Chng(Nanyang Technological University) and Haizhou Li(National University of Singapore)

Thu-2-1-9 Multi-task Learning for End-to-end Noise-robust Bandwidth Extension

Nana Hou(Nanyang Technological University), Chenglin Xu(Nanyang Technological University), Joey Tianyi Zhou(Institute of High Performance Computing (IHPC), A*STAR), Eng Siong Chng(Nanyang Technological University) and Haizhou Li(National University of Singapore)

Thu-2-1-10 Phase-aware music super-resolution using generative adversarial networks

Shichao Hu(Tencent Music Entertainment), Bin Zhang(Tencent Music Entertainment), Beici Liang(Tencent Music Entertainment), Ethan Zhao(Tencent Music Entertainment) and Simon Lui(Tencent Music Entertainment)

Speech Emotion Recognition III (SER III)   Video

Thu-2-2-1 Learning Utterance-level Representations with Label Smoothing for Speech Emotion Recognition

Jian Huang(Institute of Automation Chinese Academy of Sciences), Jinhua Tao(Institute of Automation Chinese Academy of Sciences), Bin Liu(Institute of Automation Chinese Academy of Sciences) and Zheng Lian(Institute of Automation Chinese Academy of Sciences)

Thu-2-2-2 Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition

Md Asif Jalal(University of Sheffield), Rosanna Milner(University of Sheffield), Thomas Hain(University of Sheffield) and Roger Moore(University of Sheffield)

Thu-2-2-3 Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition

Weiquan Fan(South China University of Technology), Xiangmin Xu(South China University of Technology), Xiaofen Xing(South China University of Technology) and Dongyan Huang(UBTECH Robotics Corp)

Thu-2-2-4 Speech emotion recognition with discriminative feature learning

Huan Zhou(HuaWei Technologies Co. Ltd) and Kai Liu(HuaWei Technologies Co. Ltd)

Thu-2-2-5 Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions

Hengshun Zhou(University of Science and Technology of China), Jun Du(University of Science and Technologoy of China), Yan-Hui Tu(University of Science and Technology of China) and Chin-Hui Lee(Georgia Institute of Technology)

Thu-2-2-6 Comparison of glottal source parameter values in emotional vowels

Yongwei Li(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Bin Liu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Donna Erickson(Haskins Laboratories) and Masato Akagi(Japan Advanced Institute of Science and Technology)

Thu-2-2-7 Learning to Recognize Per-rater's Emotion Perception Using Co-rater Training Strategy with Soft and Hard Labels

Huang-Cheng Chou(Department of Electrical Engineering, National Tsing Hua University) and Chi-Chun Lee(Department of Electrical Engineering, National Tsing Hua University)

Thu-2-2-8 Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition

Md Asif Jalal(University of Sheffield), Rosanna Milner(University of Sheffield) and Thomas Hain(University of Sheffield)

Accoustic Phonetics of L1-L2 and Other Interactions   Video

Thu-2-3-1 Phonetic Accommodation of L2 German Speakers to the Virtual Language Learning Tutor Mirabella

Iona Gessinger(Saarland University), Bernd Möbius(Saarland University), Bistra Andreeva(Saarland University), Eran Raveh(Saarland University) and Ingmar Steiner(audEERING GmbH)

Thu-2-3-3 Rhythmic convergence in French in contact?

Svetlana Kaminskaia(University of Waterloo)

Thu-2-3-4 Malayalam - English code-switched: Grapheme to Phoneme System

Sreeja Manghat(IEEE Graduate Member), Sreeram Manghat(IEEE Graduate Member) and Tanja Schultz(Universität Bremen)

Thu-2-3-5 Ongoing phonologization of word-final voicing alternations in two Romance languages: Romanian and French

Mathilde Hutin(Université Paris-Saclay, CNRS, LIMSI), Adèle Jatteau(Université de Lille), Ioana Vasilescu(LIMSI-CNRS), Lori Lamel(CNRS/LIMSI) and Martine Adda-Decker(LPP (Lab. Phonétique & Phonologie) / LIMSI-CNRS)

Thu-2-3-6 Cues for Perception of Gender in Synthetic Voices and the Role of Identity

Maxwell Hope(University of Delaware) and Jason Lilley(Nemours Biomedical Research)

Thu-2-3-7 Phonetic Entrainment in Cooperative Dialogues: A Case of Russian

Alla Menshikova(Saint Petersburg State University), Daniil Kocharov(Department of Phonetics, Saint Petersburg State University) and Tatiana Kachkovskaia(Saint Petersburg State University)

Thu-2-3-8 Prosodic Characteristics of Genuine and Mock (Im)polite Mandarin Utterances

Chengwei Xu(Nanjing Normal University) and Wentao Gu(Nanjing Normal University)

Thu-2-3-9 Tone variations in regionally accented Mandarin

Yanping LI(Western Sydney University), Catherine Best(Western Sydney University), Michael Tyler(Western Sydney University) and Denis Burnham(Western Sydney University)

Thu-2-3-10 F0 patterns in Mandarin statements of Mandarin and Cantonese speakers

Yike Yang(The Hong Kong Polytechnic University), Si Chen(The Hong Kong Polytechnic University) and Xi Chen(The Hong Kong Polytechnic University)

Conversational Systems   Video

Thu-2-4-1 SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering

Yung-Sung Chuang(National Taiwan University), Chi-Liang Liu(National Taiwan University), Hung-yi Lee(National Taiwan University (NTU)) and Lin-shan Lee(National Taiwan University)

Thu-2-4-2 An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

Chia-Chih Kuo(National Taiwan University of Science and Technology), Shang-Bao Luo(National Taiwan University of Science and Technology) and Kuan-Yu Chen(NTUST)

Thu-2-4-3 Entity Linking for Short Text Using Structured Knowledge Graph via Multi-grained Text Matching

Binxuan Huang(CMU), Han Wang(Amazon), Tong Wang(Amazon), Yue Liu(Amazon) and Yang Liu(Amazon)

Thu-2-4-4 Sound-Image Grounding Based Focusing Mechanism for Efficient Automatic Spoken Language Acquisition

Mingxin Zhang(Tokyo Institute of Technology), Tomohiro Tanaka(Tokyo Institute of Technology), Wenxin Hou(Tokyo Institute of Technology), Shengzhou Gao(Tokyo Institute of Technology) and Takahiro Shinozaki(Tokyo Institute of Technology)

Thu-2-4-5 Semi-supervised learning for character expression of spoken dialogue systems

Kenta Yamamoto(Kyoto University), Koji Inoue(Kyoto University) and Tatsuya Kawahara(Kyoto University)

Thu-2-4-6 Dimensional Emotion Prediction based on Interactive Context in Conversation

Xiaohan Shi(Japan Advanced Institute of Science and Technology), Sixia Li(Japan Advanced Institute of Science and Technology) and Jianwu Dang(JAIST)

Thu-2-4-7 HRI-RNN: A User-Robot Dynamics-Oriented RNN for Engagement Decrease Detection

Asma Atamna(Télécom Paris, Institut Polytechnique de Paris, France) and Chloé Clavel(LTCI, Telecom-ParisTech)

Thu-2-4-8 Neural representations of dialogical history for improving upcoming turn acoustic parameters prediction

Simone Fuscone(University of Aix-Marseille), Benoit Favre(Aix-Marseille University LIS/CNRS) and Laurent Prévot(Aix Marseille Université & CNRS)

The Attackers Perpective on Automatic Speaker Verification   Video

Thu-SS-2-5-1 The Attacker's Perspective on Automatic Speaker Verification: An Overview

Rohan Kumar Das(National University Singapore), Xiaohai Tian(National University of Singapore), Tomi Kinnunen(University of Eastern Finland) and Haizhou Li(National University of Singapore)

Thu-SS-2-5-2 Extrapolating False Alarm Rates in Automatic Speaker Verification

Alexey Sholokhov(Huawei Technologies Ltd.), Tomi Kinnunen(University of Eastern Finland), Ville Vestman(School of Computing, University of Eastern Finland, Finland) and Kong Aik Lee(Data Science Research Laboratories, NEC Corporation)

Thu-SS-2-5-3 Self-supervised Spoofing Audio Detection Scheme

Ziyue Jiang(Wuhan university), Hongcheng Zhu(Wuhan university), Peng Li(Wuhan university), Wenbing Ding(Wuhan university) and Yanzhen Ren(Wuhan university)

Thu-SS-2-5-4 Inaudible adversarial perturbations for targeted attack in speaker recognition

Qing Wang(Northwestern Polytechnical University), Pengcheng Guo(Northwestern Polytechnical University) and lei xie(School of Computer Science, Northwestern Polytechnical University)

Thu-SS-2-5-5 x-Vectors Meet Adversarial Attacks: Benchmarking Adversarial Robustness in Speaker Verification

Jesus Villalba(Johns Hopkins University), Yuekai Zhang(Johns Hopkins University) and Najim Dehak(Johns Hopkins University)

Thu-SS-2-5-6 Black-box Attacks on Spoofing Countermeasures Using Transferability of Adversarial Examples

Yuekai Zhang(Johns Hopkins University), Ziyan Jiang(Johns Hopkins University), Jesus Villalba(Johns Hopkins University) and Najim Dehak(Johns Hopkins University)

Summarization, Semantic Analysis and Classification   Video

Thu-2-6-2 Abstractive Spoken Document Summarization using Hierarchical Model with Multi-stage Attention Diversity Optimization

Potsawee Manakul(University of Cambridge), Mark Gales(Cambridge University) and Linlin Wang(University of Cambridge)

Thu-2-6-3 Improved Learning of Word Embeddings with Word Definitions and Semantic Injection

Yichi Zhang(Tsinghua University), Yinpei Dai(Alibaba Group), Zhijian Ou(Department of Electronic Engineering, Tsinghua University), Huixin Wang(China Mobile) and Junlan Feng(chinamobile)

Thu-2-6-4 Wake Word Detection with Alignment-Free Lattice-Free MMI

Yiming Wang(Johns Hopkins University), Hang Lv(Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science,Northwestern Polytechnical University, Xi’an), Dan Povey(Xiaomi, Inc.), Lei Xie(Northwestern Polytechnical University) and Sanjeev Khudanpur(Johns Hopkins University)

Thu-2-6-5 Improving Vietnamese Named Entity Recognition from Speech Using Word Capitalization and Punctuation Recovery Models

Thai Binh Nguyen(Vietnam Artificial Intelligence System), Minh Nguyen Quang(Vietnam Artificial Intelligence System), Hien Nguyen Thi Thu(), Quoc Truong Do(Vietnam Artificial Intelligence System) and LUONG CHI MAI(Institue of Information Technology)

Thu-2-6-6 End-to-end Named Entity Recognition from English Speech

Hemant Yadav(MIDAS, IIITD), Sreyan Ghosh(MIDAS@IIITD), Yi Yu(National Institute of Informatics (NII) Tokyo, Japan) and Rajiv Ratn Shah(IIIT Delhi)

Thu-2-6-7 Semantic Complexity in End-to-End Spoken Language Understanding

Joseph McKenna(Amazon Alexa), Samridhi Choudhary(Amazon Alexa), Michael Saxon(Amazon Alexa), Grant Strimel(Amazon Alexa) and Athanasios Mouchtaris(Amazon Alexa)

Thu-2-6-8 Analysis of Disfluency in Children's Speech

Trang Tran(University of Washington), Morgan Tinkler(University of California Los Angeles), Gary Yeung(University of California, Los Angeles), Abeer Alwan(UCLA) and Mari Ostendorf(University of Washington)

Thu-2-6-9 Representation based meta-learning for few-shot spoken intent recognition

Ashish Mittal(IBM Research AI), Samarth Bharadwaj(IBM Research AI), Shreya Khare(IBM Research), Saneem Chemmengath(IBM Research AI), Karthik Sankaranarayanan(IBM Reseach) and Brian Kingsbury(IBM Research)

Thu-2-6-10 Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation

Rishika Agarwal(Apple), Xiaochuan Niu(Apple), Pranay Dighe(Apple), Srikanth Vishnubhotla(Apple), Sameer Badaskar(Apple) and Devang Naik(Apple)

Speaker Recognition II   Video

Thu-2-7-1 Speaker-Utterance Dual Attention for Speaker and Utterance Verification

Tianchi Liu(Pensees Pte Ltd), Rohan Kumar Das(National University Singapore), Maulik Madhavi(National University of Singapore), Shengmei Shen(Pensees Pte Ltd) and Haizhou Li(National University of Singapore)

Thu-2-7-2 Adversarial Separation and Adaptation Network for Far-Field Speaker Verification

Lu Yi(The Hong Kong Polytechnic University), Man-Wai Mak(The Hong Kong Polytechnic University) and Yue Lang

Thu-2-7-3 MIRNet: Learning multiple identities representations in overlapped speech

Hyewon Han(Yonsei University), Soo-Whan Chung(Yonsei University) and Hong-Goo Kang(Yonsei University)

Thu-2-7-4 Strategies for End-to-End Text-Independent Speaker Verification

Weiwei Lin(The Hong Kong Polytechnic University), Man Wai Mak(The Hong Kong Polytechnic University) and Jen-Tzung Chien(National Chiao Tung University)

Thu-2-7-5 Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data

Rosa González Hautamäki(University of Eastern Finland) and Tomi Kinnunen(University of Eastern Finland)

Thu-2-7-6 Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

Amber Afshan(University of California, Los Angeles), Jinxi Guo(Amazon.com), Soo Jin Park(UCLA), Vijay Ravi(Ph.D. Student, UCLA), Alan McCree(JHU HLTCOE) and Abeer Alwan(UCLA)

Thu-2-7-7 A Machine of Few Words: Interactive Speaker Recognition with Reinforcement Learning

Mathieu Seurin(Université Lille), Florian Strub(Deepmind), Philippe Preux(Université Lille, Inria) and Olivier Pietquin(Google Brain)

Thu-2-7-8 Improving on-device speaker verification using federated learning with privacy

Filip Granqvist(Apple), Matt Seigel(Apple), Rogier van Dalen(Apple), Áine Cahill(Apple), Stephen Shum(Apple) and Matthias Paulik(Apple)

Thu-2-7-9 Neural PLDA Modeling for End-to-End Speaker Verification

Shreyas Ramoji(Indian Institute of Science), Prashant Krishnan(Indian Institute of Science) and Sriram Ganapathy(Indian Institute of Science, Bangalore, India, 560012)

General Topics in Speech Recognition   Video

Thu-2-8-1 State sequence pooling training of acoustic models for keyword spotting

Kuba Lopatka(Intel Corporation) and Tobias Bocklet(Technische Hochschule Nürnberg)

Thu-2-8-2 Training Keyword Spotting Models on Non-IID Data with Federated Learning

Andrew Hard(Google Inc.), Kurt Partridge(Google Inc.), Cameron Nguyen(Google Inc.), Niranjan Subrahmanya(Google Inc.), Aishanee Shah(Google Inc.), Pai Zhu(Google Inc.), Ignacio Moreno(Google Inc.) and Rajiv Mathews(Google Inc.)

Thu-2-8-3 CLASS LM AND WORD MAPPING FOR CONTEXTUAL BIASING IN END-TO-END ASR

Rongqing Huang(Apple Inc), Ossama Abdel-hamid(Apple Inc), Xinwei Li(Apple Inc) and Gunnar Evermann(Apple Inc)

Thu-2-8-4 Do End-to-End Speech Recognition Models Care About Context?

Lasse Borgholt(University of Copenhagen), Jakob Drachmann Havtorn(Corti ApS), Željko Agić(Corti), Anders Søgaard(University of Copenhagen), Lars Maaløe(Corti) and Christian Igel(University of Copenhagen)

Thu-2-8-5 Utterance confidence measure for end-to-end speech recognition with applications to ondevice-server hybrid ASR

Ankur Kumar(Samsung Research India Bangalore), Dhananjaya Gowda(Samsung Research), Sachin Singh(SRIB), Abhinav Garg(Samsung Research), Shatrughan Singh(SRIB) and Chanwoo Kim(Samsung Research)

Thu-2-8-6 Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-learning

Huaxin Wu(iFlytek Research, iFlytek Co., Ltd), Genshun Wan(University of Science and Technology of China) and Jia Pan(University of Science and Technology of China)

Thu-2-8-7 Domain Adaptation Using Class Similarity for Robust Speech Recognition

Han Zhu(University of Chinese Academy of Sciences), Jiangjiang Zhao(China Mobile Online Services Company Limited), Yuling Ren(China Mobile Online Services Company Limited), Li Wang(Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics) and Pengyuan Zhang(University of Chinese Academy of Sciences)

Thu-2-8-8 Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time

Sashi Novitasari(Nara Institute of Science and Technology), Andros Tjandra(Nara Institute of Science and Technology), Tomoya Yanagita(Nara Institute of Science and Technology), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Satoshi Nakamura(Nara Institute of Science and Technology)

Thu-2-8-9 Context-Dependent Acoustic Modeling without Explicit Phone Clustering

Tina Raissi(RWTH Aachen University), Eugen Beck(RWTH Aachen University), Ralf Schlüter(Lehrstuhl Informatik 6, RWTH Aachen University) and Hermann Ney(RWTH Aachen University)

Thu-2-8-10 Voice Conversion Based Data Augmentation to Improve Children's Speech Recognition in Limited Data Scenario

Syed Shahnawazuddin(National Institute of Technology Patna), Nagaraj Adiga(University of Crete), Kunal Kumar(National Institute of Technology Patna), Aayushi Poddar(National Institute of Technology Patna) and Waquar Ahmad(National Institute of Technology Calicut)

Speech Synthesis: Prosody Modeling   Video

Thu-2-9-1 CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech

Sri Karlapati(Amazon.com), Alexis Moinet(Amazon), Arnaud Joly(Amazon.com), Viacheslav Klimkov(Amazon.com), Daniel Sáez-Trigueros(Amazon.com) and Thomas Drugman(Amazon)

Thu-2-9-2 Joint detection of sentence stress and phrase boundary for prosody

Binghuai Lin(Tencent Technology Co., Ltd), Liyuan Wang(Tencent Technology Co., Ltd), Xiaoli FENG(Center for Studies of Chinese as a Second Language Beijing Language and Culture University) and Jinsong Zhang(Beijing Language and Culture University)

Thu-2-9-3 Transfer learning of the expressivity using FLOW metric learning in multispeaker text-to-speech synthesis

Ajinkya Kulkarni(Universite de Lorraine, CNRS, Inria, LORIA), Vincent Colotte(University of Lorraine) and Denis Jouvet(LORIA - INRIA)

Thu-2-9-4 Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning

Jaesung Bae(NCSOFT), Hanbin Bae(NCSOFT), Young-Sun Joo(NCSOFT), Junmo Lee(NCSOFT), Gyeong-Hoon Lee(NCSOFT) and HOON-YOUNG CHO(NCSOFT, AI Center, Speech Lab)

Thu-2-9-5 Dynamic Prosody Generation for Speech Synthesis Using Linguistics-Driven Acoustic Embedding Selection

Shubhi Tyagi(Amazon), Marco Nicolis(Amazon), Jonas Rohnke(Amazon), Thomas Drugman(Amazon) and Jaime Lorenzo-Trueba(Amazon)

Thu-2-9-6 Improving the Prosody of RNN-based English Text-To-Speech Synthesis by Incorporating a BERT model

Tom Kenter(Google UK), Manish Sharma(Google) and Robert Clark(Google, UK)

Thu-2-9-7 Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Yi Zhao(National Institute of Informatics (NII)), Haoyu Li(National Institute of Informatics), Cheng-I Lai(Massachusetts Institute of Technology), Jennifer Williams(University of Edinburgh), Erica Cooper(National Institute of Informatics) and Junichi Yamagishi(National Institute of Informatics)

Thu-2-9-8 Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit

Zhen Zeng(Ping An Technology (Shenzhen) Co., Ltd.), Jianzong Wang(Ping An Technology (Shenzhen) Co., Ltd.), Ning Cheng(Ping An Technology (Shenzhen) Co., Ltd.) and Jing Xiao(Ping An Technology (Shenzhen) Co., Ltd.)

Thu-2-9-9 Discriminative Method to Extract Coarse Prosodic Structure and Its Application for Statistical Phrase/Accent Command Estimation

Yuma Shirahata(The University of Tokyo), Daisuke Saito(The University of Tokyo) and Nobuaki Minematsu(The University of Tokyo)

Thu-2-9-10 Controllable neural text-to-speech synthesis using intuitive prosodic features

Tuomo Raitio(Apple), Ramya Rasipuram(Apple) and Dan Castellani(Apple)

Thu-2-9-11 Controllable Neural Prosody Synthesis

Max Morrison(Northwestern University), Zeyu Jin(Adobe Research), Justin Salamon(Adobe Research), Nick Bryan(Adobe Research) and Gautham Mysore(Adobe Research)

Thu-2-9-12 Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

Matt Whitehill(University of Washington), Shuang Ma(University of Buffalo), Daniel McDuff(Microsoft) and Yale Song(Microsoft)

Thu-2-9-13 Interactive Text-to-Speech System via Joint Style Analysis

Yang Gao(Carnegie Mellon University), Weiyi Zheng(Facebook AI), Zhaojun Yang(Facebook Inc), Thilo Kohler(Facebook Inc), Christian Fuegen(Facebook) and Qing He(Facebook Inc)

Language Learning   Video

Thu-2-10-1 Mobile-Assisted Prosody Training for Limited English Proficiency: Learner Background and Speech Learning Pattern

Okim Kang(Northern Arizona University), Kevin Hirschi(Northern Arizona University), Catia Cucchiarini(Radboud University Nijmegen), John H.L. Hansen(Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems), Keelan Evanini(Educational Testing Service) and Helmer Strik(Centre for Language and Speech Technology (CLST), Centre for Language Studies (CLS), Radboud University Nijmegen)

Thu-2-10-2 Finding intelligible consonant-vowel sounds using high-quality articulatory synthesis

Daniel Van Niekerk(Department of Speech, Hearing and Phonetic Sciences, University College London), Anqi Xu(Department of Speech, Hearing and Phonetic Sciences, University College London), Branislav Gerazov(Faculty of Electrical Engineering and Information Technologies, University of Ss. Cyril and Methodius – Skopje), Paul Konstantin Krug(Technische Universität Dresden), Peter Birkholz(Institute of Acoustics and Speech Communication, TU Dresden) and Yi Xu(University College London)

Thu-2-10-3 Audiovisual Correspondence Learning in Humans And Machines

Venkat Krishnamohan(Indian Institute of Science), Akshara Soman(Indian Institute of Science, Bangalore), Anshul Gupta(Mercedes Benz Research and Development) and Sriram Ganapathy(Indian Institute of Science, Bangalore, India, 560012)

Thu-2-10-5 Perception of Japanese consonant length by native speakers of Korean differing in Japanese learning experience

Kimiko Tsukada(), Joo-Yeon Kim(Konkuk University) and Jeong-Im Han(Konkuk University)

Thu-2-10-6 Automatic Detection of Phonological Errors in Child Speech Using Siamese Recurrent Autoencoder

Si-Ioi Ng(The Chinese University of Hong Kong) and Tan Lee(The Chinese University of Hong Kong)

Thu-2-10-7 A Comparison of English Rhythm Produced by Native American Speakers and Mandarin ESL Primary School Learners

Hongwei Ding(Shanghai Jiao Tong University), Binghuai Lin(Tencent Technology Co., Ltd), Liyuan Wang(Tencent Technology Co., Ltd), Hui Wang(Shanghai Jiao Tong University) and Ruomei Fang(Shanghai Jiao Tong University)

Thu-2-10-9 Cross-Linguistic Perception of Utterances with Willingness and Reluctance in Mandarin by Korean L2 Learners

Wenqian Li(Shanghai Jiao Tong University) and Jung-Yueh Tu(National Chengchi University)

Speech Enhancement   Video

Thu-2-11-1 Speech Enhancement Based on Beamforming and Post-Filtering by Combining Phase Information

Rui Cheng(Beijing University of Technology) and Changchun Bao(Beijing University of Technology)

Thu-2-11-2 A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement

Yuxuan Wang(University of Science and Technology of China), Jun Du(University of Science and Technologoy of China), Li Chai(University of Science and Technology of China), Chin-Hui Lee(Georgia Institute of Technology) and Jia Pan(University of Science and Technology of China)

Thu-2-11-3 HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Jiaqi Su(Princeton University), Zeyu Jin(Adobe Research) and Adam Finkelstein(Princeton University)

Thu-2-11-4 Learning Complex Spectral Mapping for Speech Enhancement with Improved Cross-corpus Generalization

Ashutosh Pandey(Department of Computer Science and Engineering, The Ohio State University) and DeLiang Wang(Ohio State University)

Thu-2-11-5 Speech Enhancement with Stochastic Temporal Convolutional Networks

Julius Richter(Signal Processing (SP), Universität Hamburg, Germany), Guillaume Carbajal(Signal Processing (SP), Universität Hamburg, Germany) and Timo Gerkmann(Universität Hamburg)

Thu-2-11-6 Visual Speech In Real Noisy Environment: Dataset and a Baseline System

Mandar Gogate(Edinburgh Napier University), Kia Dashtipour(Edinburgh Napier University) and Amir Hussain(Edinburgh Napier University)

Thu-2-11-7 Sparse Mixture of Local Experts for Efficient Speech Enhancement

Aswin Sivaraman(Indiana University) and Minje Kim(Indiana University)

Thu-2-11-8 Improved speech enhancement using TCN with multiple encoder-decoder layers

Vinith Kishore(Samsung Research Institute Banglore), Nitya Tiwari(Samsung Research Institute Bangalore) and periyasamy Paramasivam(Samsung)

Thu-2-11-9 Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations

Cunhang Fan(Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Bin Liu(CASIA), Jiangyan Yi(Institute of Automation, Chinese Academy of Sciences) and Zhengqi Wen(CASIA)

Thu-2-11-10 Unsupervised Robust Speech Enhancement Based on Alpha-Stable Fast Multichannel Nonnegative Matrix Factorization

Mathieu Fontaine(Riken AIP), Kouhei Sekiguchi(Riken AIP), Aditya Arie Nugraha(Riken AIP) and Kazuyoshi Yoshii(University of Kyoto)

Thursday 21:45-22:45(GMT+8), October 29

Speech in Health II (HEALTH II)   Video

Thu-3-1-1 Squeeze for Sneeze: Compact Neural Networks for Cold and Flu Recognition

Merlin Albes(University of Augsburg), Zhao Ren(University of Augsburg), Björn Schuller(University of Augsburg / Imperial College London) and Nicholas Cummins(University of Augsburg)

Thu-3-1-2 Extended Study on the Use of Vocal Tract Variables to Quantify Neuromotor Coordination in Depression

Nadee Seneviratne(University of Maryland, College Park), Carol Espy-Wilson(University of Maryland, College Park), James R. Williamson(MIT Lincoln Laboratory), Adam C. Lammert(Worcester Polytechnic Institute) and Thomas F. Quatieri(MIT Lincoln Laboratory)

Thu-3-1-3 Affective Conditioning on Hierarchical Attention Networks applied to Depression Detection from Transcribed Clinical Interviews

Danai Xezonaki(National Technical University of Athens), Georgios Paraskevopoulos(National Technical University of Athens), Alexandros Potamianos(National Technical University of Athens) and Shrikanth Narayanan(University of Southern California)

Thu-3-1-4 Domain Adaptation for Enhancing Speech-based Depression Detection in Natural Environmental Conditions Using Dilated CNNs

Zhaocheng Huang(School of Electrical Engineering and Telecommunications, UNSW Australia), Julien Epps(School of Electrical Engineering and Telecommunications, UNSW Australia), Dale Joachim(Sonde Health), Brian Stasak(University of New South Wales), James Williamson(MIT Lincoln Laboratory) and Thomas Quatieri(MIT Lincoln Laboratory)

Thu-3-1-5 Making a Distinction between Schizophrenia and Bipolar Disorder Based on Temporal Parameters in Spontaneous Speech

Gábor Gosztolya(Research Group on Artificial Intelligence), Anita Bagi(University of Szeged, Department of Hungarian Linguistics), Szilvia Szalóki(University of Szeged, Department of Psychiatry), István Szendi(University of Szeged, Department of Psychiatry) and Ildiko Hoffmann(University of Szeged)

Thu-3-1-6 Prediction of Sleepiness Ratings from Voice by Man and Machine

Mark Huckvale(University College London), Andras Beke(University College London) and Mirei Ikushima(University College London)

Thu-3-1-7 Tongue and Lip Motion Patterns in Alaryngeal Speech

Kristin Teplansky(University of Texas at Austin), Alan Wisler(University of Texas at Austin), Beiming Cao(University of Texas at Austin), Wendy Liang(University of Texas at Austin), Chad Whited(Austin Ear, Nose, and Throat Clinic), Ted Mau(UT Southwestern Medical Center) and Jun Wang(University of Texas at Austin)

Thu-3-1-8 Autoencoder bottleneck features with multi-task optimisation for improved continuous dysarthric speech recognition

Zhengjun Yue(SPandH Group, University of Sheffield), Heidi Christensen(University of Sheffield) and Jon Barker(University of Sheffield)

Thu-3-1-9 Raw speech waveform based classification of patients with ALS, Parkinson’s Disease and healthy controls using CNN-BLSTM

Jhansi Mallela(Indian Institute of Science), Aravind Illa(PhD Student, Indian Institute of Science, Bangalore), Yamini Belur(National Institute of Mental Health and Neurosciences), Nalini Atchayaram(National Institute of Mental Health and Neurosciences), Pradeep Reddy(National Institute of Mental Health and Neurosciences), Dipanjan Gope(National Institute of Mental Health and Neurosciences) and Prasanta Kumar Ghosh(Indian Institute of Science)

Thu-3-1-10 Assessment of Parkinson’s Disease Medication State through Automatic Speech Analysis

Anna Pompili(INESC-ID), Rubén Solera-Ureña(INESC-ID), Alberto Abad(INESC-ID, Instituto Superior Técnico, Universidade de Lisboa), Rita Cardoso(Laboratory of Clinical Pharmacology and therapeutics, Faculdade de Medicina, Universidade de Lisboa, Instituto de Medicina Molecular, CNS - Campus Neurológico Sénior), Isabel Guimarães(Laboratory of Clinical Pharmacology and therapeutics, Faculdade de Medicina, Universidade de Lisboa, Instituto de Medicina Molecular, Alcoitão Schoool of Health Sciences, Santa Casa da Misericórida de Lisboa), Margherita Fabbri(Clinical Investigation Center CIC1436, Departments of Clinical Pharmacology and Neurosciences, NS-Park/FCRIN network and NeuroToul Center of Excellence for Neurodegeneration, INSERM, University Hospital of Toulouse and University of Toulouse), Isabel Pavão Martins(Laboratório de Estudos de Linguagem, Faculty of Medicine, University of Lisbon, Instituto de Medicina Molecular) and Joaquim Ferreira(Laboratory of Clinical Pharmacology and therapeutics, Faculdade de Medicina, Universidade de Lisboa, Instituto de Medicina Molecular, CNS - Campus Neurológico Sénior)

Speech and Audio Quality Assessment   Video

Thu-3-2-1 Improving Replay Detection System with Channel Consistency DenseNeXt for the ASVspoof 2019 Challenge

Chao Zhang(Ping An technology co. LTD), Junjie Cheng(Ping An technology co. LTD), Yanmei Gu(Ping An technology co. LTD), Huacan Wang(Ping An technology co. LTD), Jun Ma(Ping An technology co. LTD), Shaojun Wang(Ping An technology co. LTD) and Jing Xiao(Ping An technology co. LTD)

Thu-3-2-2 Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System

Przemyslaw Falkowski-Gilski(Gdansk University of Technology), Grzegorz Debita(General Tadeusz Kosciuszko Military University of Land Forces), Marcin Habrych(Wroclaw University of Science and Technology), Bogdan Miedzinski(Wroclaw University of Science and Technology), Przemyslaw Jedlikowski(Wroclaw University of Science and Technology), Bartosz Polnik(KOMAG Institute of Mining Technology), Jan Wandzio(KGHM Polska Miedz S.A.) and Xin Wang(China Agriculture University)

Thu-3-2-3 Investigating the Visual Lombard Effect with Gabor Based Features

Waito Chiu(Xi'an Jiaotong-Liverpool University), Yan Xu(Xi'an Jiaotong-Liverpool University), Andrew Abel(Xi'an Jiaotong-Liverpool University), Chun Lin(Anhui University) and Zhengzheng Tu(Anhui University)

Thu-3-2-4 Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models

Qiang Huang(University of Sheffield) and Thomas Hain(University of Sheffield)

Thu-3-2-5 Development of a Speech Quality Database Under Uncontrolled Conditions

Alessandro Ragano(University College Dublin), Emmanouil Benetos(Queen Mary University of London) and Andrew Hines(University College Dublin)

Thu-3-2-6 Evaluating acoustic speech embeddings reliability

Robin Algayres(Ecole Normale Supérieure/PSL/Inria), Mohamed Salah Zaiem(Ecole Normale Supérieur/PSL/Inria), Benoît Sagot(Inria) and Emmanuel Dupoux(Ecole des Hautes Etudes en Sciences Sociales)

Thu-3-2-7 Frame-level Signal-to-Noise Ratio Estimation using Deep Learning

Hao Li(Computer Science Department, Inner Mongolia University), DeLiang Wang(Ohio State University), Xueliang Zhang(Computer Science Department, Inner Mongolian University) and Guanglai Gao

Thu-3-2-9 Effect of Spectral Complexity Reduction and Number of Instruments on Musical Enjoyment with Cochlear Implants

Avamarie Brueggeman(The University of Texas at Dallas) and John H.L. Hansen(Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems)

Privacy and Security in Speech Communication   Video

Thu-3-3-1 Distributed Summation Privacy for Speech Enhancement

Matthew O'Connor(Victoria University of Wellington) and W. Bastiaan Kleijn(Victoria University of Wellington)

Thu-3-3-2 Perception of Privacy Measured in the Crowd - Paired Comparison on the Effect of Background Noises

Anna Leschanowsky(Aalto University), Sneha Das(Aalto University), Tom Bäckström(Aalto University) and Pablo Pérez Zarazaga(Aalto University)

Thu-3-3-3 Hide and Speak: Towards Deep Neural Networks for Speech Steganography

Felix Kreuk(Bar-Ilan University), Yossi Adi(Facebook AI Research), Bhiksha Raj(Carnegie Mellon University), Rita Singh(Carnegie Mellon University) and Joseph Keshet(Bar-Ilan University)

Thu-3-3-4 Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification

Sina Däubener(Ruhr University Bochum), Lea Schönherr(Ruhr University Bochum), Asja Fischer(Ruhr University Bochum) and Dorothea Kolossa(Ruhr-Universität Bochum)

Thu-3-3-5 Privacy Guarantees for De-identifying Text Transformations

David Adelani(Saarland University), Ali Davody(Saarland University), Thomas Kleinbauer(Saarland University) and Dietrich Klakow(Saarland University)

Thu-3-3-6 Detecting Audio Attacks on ASR Systems with Dropout Uncertainty

Tejas Jayashankar(Massachusetts Institute of Technology), Jonathan Le Roux(Mitsubishi Electric Research Laboratories) and Pierre Moulin(University of Illinois)

Voice Conversion and Adaptation II   Video

Thu-3-4-1 Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

Wen Chin Huang(Nagoya University), Tomoki Hayashi(Nagoya University), Yi-Chiao Wu(Nagoya University), Hirokazu Kameoka(NTT Communication Science Laboratories) and Tomoki Toda(Nagoya University)

Thu-3-4-2 Nonparallel Training of Exemplar-based Voice Conversion System Using INCA-based Alignment Technique

Hitoshi Suda(The University of Tokyo), Gaku Kotani(The University of Tokyo) and Daisuke Saito(The University of Tokyo)

Thu-3-4-3 Enhancing Intelligibility of Dysarthric Speech Using Gated Convolutional-based Voice Conversion System

Chen-Yu Chen(Department of Biomedical Engineering, National Yang-Ming University), Wei-Zhong Zheng(Department of Biomedical Engineering, National Yang-Ming University), Syu-Siang Wang(Research Center for Information Technology Innovation, Academia Sinica), Yu Tsao(Academia Sinica), Pei-Chun Li(Department of Audiology and Speech Language Pathology, Mackay Medical College) and Ying-Hui Lai(National Yang-Ming University)

Thu-3-4-4 VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture

Da-Yi Wu(National Taiwan University), Yen-Hao Chen(National Taiwan University) and Hung-yi Lee(National Taiwan University (NTU))

Thu-3-4-5 Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data

Seung-won Park(Seoul National University, MINDsLab Inc.), Doo-young Kim(Seoul National University, MINDsLab Inc.) and Myun-chul Joe(MINDsLab Inc.)

Thu-3-4-6 Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis

Ruibo Fu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Tao Wang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Chunyu Qiang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)

Thu-3-4-7 ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data

Zheng Lian(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing), Xinyong Zhou(Northwestern Polytechnical University) and Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing)

Thu-3-4-9 Non-parallel Voice Conversion with Fewer Labeled Data by Conditional Generative Adversarial Networks

Minchuan Chen(Ping An Technology), Weijian Hou(Ping An Technology), Jun Ma(Ping An Technology), Shaojun Wang(Ping An Technology) and Jing Xiao(Ping An Technology)

Thu-3-4-10 Transferring Source Style in Non-Parallel Voice Conversion

Songxiang Liu(CUHK), Yuewen Cao(The Chinese University of HongKong), Shiyin Kang(Tencent), Na Hu(Tencent AI Lab), Xunying Liu(Chinese University of Hong Kong), Dan Su(Tencent AILab Shenzhen), Dong Yu(Tencent AI Lab) and Helen Meng(The Chinese University of Hong Kong)

Thu-3-4-11 Voice Conversion using Speech-to-Speech Neuro-Style Transfer

Ehab AlBadawy(Student) and Siwei Lyu(Professor)

Multilingual and Code-Switched ASR   Video

Thu-3-5-1 Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

Changhan Wang(Facebook AI Research), Juan Pino(Facebook) and Jiatao Gu(Facebook AI Research)

Thu-3-5-2 Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings

Samuel Thomas(IBM Research AI), Kartik Audhkhasi(IBM Research) and Brian Kingsbury(IBM Research)

Thu-3-5-3 Multilingual Speech Recognition with Self-Attention Structured Parameterization

Yun Zhu(Google), Parisa Haghani(Google), Anshuman Tripathi(Google), Bhuvana Ramabhadran(Google), Brian Farris(Google), Hainan Xu(Google), Han Lu(Google), Hasim Sak(Google), Isabel Leal(Google), Neeraj Gaur(Google), Pedro Moreno(google inc.) and Qian Zhang(Google)

Thu-3-5-4 Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems

Srikanth Madikeri(Idiap Research Institute), Banriskhem Kayang Khonglah(Idiap Research Institute), Sibo Tong(Idiap Research Institute), Petr Motlicek(Idiap Research Institute), Herve Bourlard(Idiap Research Institute & EPFL) and Dan Povey(Xiaomi, Inc.)

Thu-3-5-5 Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

Vineel Pratap(Facebook), Anuroop Sriram(Facebook AI), Paden Tomasello(Facebook AI), Awni Hannun(Facebook AI Research), Vitaliy Liptchinsky(Facebook AI), Gabriel Synnaeve(Facebook AI Research) and Ronan Collobert(Facebook AI Research)

Thu-3-5-6 Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages

Hardik Sailor(Samsung Research Institute, Bangalore, India) and Thomas Hain(University of Sheffield)

Thu-3-5-7 Style Variation as a Vantage Point for Code-Switching

Khyathi Raghavi Chandu(Carnegie Mellon University) and Alan W Black(Carnegie Mellon University)

Thu-3-5-8 Bi-encoder Transformer Network for Mandarin-English Code-switching Speech Recognition using Mixture of Experts

Yizhou Lu(Shanghai Jiao Tong University), Mingkun Huang(Shanghai Jiao Tong University), Hao Li(Shanghai Jiao Tong University), Jiaqi Guo(Shanghai Jiao Tong University) and Yanmin Qian(Shanghai Jiao Tong University)

Thu-3-5-9 Improving Low Resource Code-switched ASR using Augmented Code-switched TTS

Yash Sharma(Indian Institute of Technology Bombay), Basil Abraham(Microsoft), Karan Taneja(IIT Bombay) and Preethi Jyothi(Indian Institute of Technology Bombay)

Thu-3-5-10 Towards Context-Aware End-to-End Code-Switching Speech Recognition

Zimeng Qiu(Amazon Alexa AI), Yiyuan Li(Carnegie Mellon University), Xinjian Li(Carnegie Mellon University), Florian Metze(Carnegie Mellon University) and William Campbell(Amazon)

Speech and Voice Disorders   Video

Thu-3-6-1 Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency

Tuan Dinh(OHSU), Alexander Kain(OHSU), Robin Samlan(University of Arizona), Beiming Cao(University of Texas at Austin) and Jun Wang(University of Texas at Austin)

Thu-3-6-2 Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning

Han Tong(Unitec Institute of Technology), Hamid Sharifzadeh(Unitec Institute of Technology) and Ian McLoughlin(Singapore Institute of Technology)

Thu-3-6-3 Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription

Yuqin Lin(Tianjin University), Longbiao Wang(Tianjin University), Sheng Li(National Institute of Information and Communications Technology (NICT), Advanced Speech Technology Laboratory), Jianwu Dang(JAIST) and Chenchen Ding(NICT)

Thu-3-6-4 Dysarthric Speech Recognition Based on Deep Metric Learning

Yuki Takashima(Kobe University), Ryoichi Takashima(Kobe University), Tetsuya Takiguchi(Kobe University) and Yasuo Ariki(Kobe University)

Thu-3-6-5 Automatic Glottis Detection and Segmentation in Stroboscopic videos using Convolutional Networks

Divya Degala(Indian Institute of Science), Achuth Rao M V(Indian Institute of Science), Rahul Krishnamurthy(Manipal Academy for Higher Education), Pebbili Gopikishore(All India Institute of Speech and Hearing), Veeramani Priyadharshini(All India Institute of Speech and Hearing), Prakash T K(All India Institute of Speech and Hearing) and Prasanta Ghosh(Assistant Professor, EE, IISc)

Thu-3-6-6 Acoustic feature extraction with interpretable deep neural network for neurodegenerative related disorder classification

Yilin Pan(University of Sheffield), Bahman Mirheidari(Department of Computer Science, University of Sheffield), Zehai Tu(Department of Computer Science, University of Sheffield), Ronan O'Malley(Sheffield Institute for Translational Neuroscience (SITraN)), Traci Walker(Department of Human Communication Sciences), Annalena Venneri(Department of Neuroscience, Royal Hallamshire Hospital), Markus Reuber(Academic Neurology Unit, Royal Hallamshire Hospital), Daniel Blackburn(Sheffield Institute for Translational Neuroscience (SITraN)) and Heidi Christensen(University of Sheffield)

Thu-3-6-7 Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis

Neeraj Sharma(Carnegie Mellon University), Prashant Krishnan(Indian Institute of Science), Rohit Kumar(Indian Institute of Science), Shreyas Ramoji(Indian Institute of Science), Srikanth Raj Chetupalli(Indian Institute of Science, Bangalore), Nirmala R(Indian Institute of Science), Prasanta Ghosh(Assistant Professor, EE, IISc) and Sriram Ganapathy(Indian Institute of Science, Bangalore, India, 560012)

Thu-3-6-8 Acoustic-Based Articulatory Phenotypes of Amyotrophic Lateral Sclerosis and Parkinson’s Disease: Towards an Interpretable, Hypothesis-Driven Framework of Motor Control

Hannah Rowe(Massachusetts General Hospital Institute of Health Professions (MGH IHP)), Sarah Gutz(Harvard University), Marc Maffei(Massachusetts General Hospital Institute of Health Professions (MGH IHP)) and Jordan Green(MGH IHP)

Thu-3-6-9 Recognising Emotions in Dysarthric Speech Using Typical Speech Data

Lubna Alhinti(University of Sheffield), Stuart Cunningham(University of Sheffield) and Heidi Christensen(University of Sheffield)

Thu-3-6-10 Detecting and analysing spontaneous oral cancer speech in the wild

Bence Halpern(Netherlands Cancer Institute, University of Amsterdam), Rob van Son(Netherlands Cancer Institute&Universiteit van Amsterdam), Michiel van den Brekel(Netherlands Cancer Institute) and Odette Scharenborg(Multimedia computing, Delft University of Technology)

The Zero Resource Speech Challenge 2020   Video

Thu-3-7-1 The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

Ewan Dunbar(Université Paris Diderot), julien karadayi(ENS Ulm), Mathieu Bernard(ENS Ulm), Xuan-Nga Cao(LSCP - EHESS / ENS / PSL Research University / CNRS / INRIA), Robin Algayres(Ecole Normale Supérieure/PSL/Inria), Lucas Ondel(Brno University of Technology), Laurent Besacier(LIG), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Emmanuel Dupoux(Ecole des Hautes Etudes en Sciences Sociales)

Thu-3-7-2 Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

Benjamin Van Niekerk(Stellenbosch University), Leanne Nortje(Stellenbosch University) and Herman Kamper(Stellenbosch University)

Thu-3-7-3 Exploration of End-to-end Synthesisers for Zero Resource Speech Challenge 2020

Karthik Pandia D S(Indian Institute of Technology Madras), Anusha Prakash(Indian Institute of Technology Madras), Mano Ranjith Kumar(Indian Institute of Technology, Madras) and Hema Murthy(IIT Madras)

Thu-3-7-8 Vector Quantized Temporally-Aware Correspondence Sparse Autoencoders for Zero-resource Acoustic Unit Discovery

Batuhan Gundogdu(Bogazici University), Bolaji Yusuf(Bogazici University), Mansur Yesilbursa(Bogazici University) and Murat Saraclar(Bogazici University)

Thu-3-7-7 Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge

Andros Tjandra(Nara Institute of Science and Technology), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Satoshi Nakamura(Nara Institute of Science and Technology and RIKEN AIP Center)

Thu-3-7-6 Exploring TTS without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020)

Takashi Morita(Primate Research Institute, Kyoto University) and Hiroki Koda(Primate Research Institute, Kyoto University)

Thu-3-7-5 Cyclic Spectral Modeling for Unsupervised Unit Discovery into Voice Conversion with Excitation and Waveform Modeling

Patrick Lumban Tobing(Nagoya University), Tomoki Hayashi(Nagoya University), Yi-Chiao Wu(Nagoya University), Kazuhiro Kobayashi(Nagoya University) and Tomoki Toda(Nagoya University)

Thu-3-7-4 Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders

Mingjie Chen(University of Sheffield) and Thomas Hain(University of Sheffield)

Thu-3-7-9 Unsupervised Discovery of Recurring Speech Patterns using Probabilistic Adaptive Metrics

Okko Räsänen(Tampere University) and Maria Andrea Cruz Blandon(Tampere University)

Thu-3-7-10 Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

Saurabhchand Bhati(The Johns Hopkins University), Jesus Villalba(Johns Hopkins University), Piotr Żelasko(Johns Hopkins University) and Najim Dehak(Johns Hopkins University)

Thu-3-7-11 Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

Juliette Millet(LLF, Université de Paris and CoML team, LSCP, ENS Paris) and Ewan Dunbar(Université Paris Diderot)

Thu-3-7-12 Decoding imagined, heard, and spoken speech: classification and regression of EEG using a 14-channel dry-contact mobile headset

Jonathan Clayton(The University of Edinburgh), Scott Wellington(SpeakUnique Limited), Cassia Valentini-Botinhao(The Centre for Speech Technology Research, University of Edinburgh) and Oliver Watts(SpeakUnique Limited)

Thu-3-7-13 Glottal Closure Instants Detection from EGG Signal by Classification Approach

Gurunath Reddy M(Indian Institute of Technology, Kharagpur, India), K Sreenivasa Rao(Professor) and Partha Pratim Das(Department of Computer Science & Engineering, IIT Kharagpur)

Thu-3-7-14 Classify Imaginary Mandarin Tones with Cortical EEG Signals

Hua Li(Shenzhen University) and Fei Chen(Southern University of Science and Technology)

LM Adaptation, Lexical Units and Punctuation   Video

Thu-3-8-1 Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework

Johanes Effendi(Nara Institute of Science and Technology (NAIST) / RIKEN AIP), Andros Tjandra(Nara Institute of Science and Technology), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Satoshi Nakamura(Nara Institute of Science and Technology)

Thu-3-8-2 Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

Lukasz Augustyniak(Wroclaw University of Science and Technology), Piotr Szymański(Avaya Inc. / Wrocław University of Technology), Mikolaj Morzy(Poznan University of Technology), Piotr Żelasko(Johns Hopkins University), Adrian Szymczak(AVAYA), Jan Mizgajski(AVAYA), Yishay Carmiel(AVAYA) and Najim Dehak(Johns Hopkins University)

Thu-3-8-3 Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

Monica Sunkara(Amazon), Srikanth Ronanki(Amazon), Dhanush Bekal(Amazon), Sravan Bodapati(Amazon) and Katrin Kirchhoff(Amazon)

Thu-3-8-4 Efficient MDI Adaptation for N-gram Language Models

Ruizhe Huang(CLSP, Johns Hopkins University), Ashish Arora(Johns Hopkins University), Ke Li(Johns Hopkins University), Dan Povey(Johns Hopkins University) and Sanjeev Khudanpur(Johns Hopkins University)

Thu-3-8-5 Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

Cal Peyser(Google Inc.), Sepand Mavandadi(Google), Tara Sainath(Google), James Apfel(Google), Ruoming Pang(Google) and Shankar Kumar(Google)

Thu-3-8-6 Language Model Data Augmentation Based on Text Domain Transfer

Atsunori Ogawa(NTT Communication Science Laboratories), Naohiro Tawara(NTT Communication Science Laboratories) and Marc Delcroix(NTT Communication Science Laboratories)

Thu-3-8-7 Contemporary Polish Language Model (Version 2) Using Big Data and Sub-Word Approach

Krzysztof Wolk(Polish-Japaese Academy of Information Technology)

Thu-3-8-8 Improving Speech Recognition of Compound-rich Languages

Prabhat Pandey(Amazon), Volker Leutnant(Amazon), Simon Wiesler(Amazon), Jahn Heymann(Amazon) and Daniel Willett(Amazon)

Thu-3-8-9 Language Modeling for Speech Analytics in Under-Resourced Languages

Charl van Heerden(Saigen (Pty) Ltd), Simone Wills(Saigen (Pty) Ltd), Pieter Uys(Saigen (Pty) Ltd) and Etienne Barnard(Saigen (Pty) Ltd)

Speech in Health I (HEALTH I)   Video

Thu-3-9-1 An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

Jing Han(University of Augsburg), Kun Qian(The University of Tokyo), Meishu Song(Univeristy of Augsburg), Zijiang Yang(ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany), Zhao Ren(University of Augsburg), Shuo Liu(ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing,University of Augsburg, Germany), Juan Liu(Huazhong University of Science and Technology), Huaiyuan Zheng(Huazhong University of Science and Technology), Wei Ji(Huazhong University of Science and Technology), Tomoya Koike(The University of Tokyo), Xiao Li(Children’s Hospital of Chongqing Medical University), Zixing Zhang(Imperial College London), Yoshiharu Yamamoto(The University of Tokyo) and Björn Schuller(University of Augsburg / Imperial College London)

Thu-3-9-2 An Evaluation of the Effect of Anxiety on Speech - Computational Prediction of Anxiety from Sustained Vowels

Alice Baird(University of Augsburg), Nicholas Cummins(University of Augsburg), Sebastian Schnieder(Institut für experimentelle Psychophysiologie), Jarek Krajewski(Univ. Wuppertal) and Björn Schuller(University of Augsburg / Imperial College London)

Thu-3-9-3 Hybrid Network Feature Extraction for Depression Assessment from Speech

Ziping Zhao(Tianjin Normal University), Qifei Li(Tianjin Normal University), Nicholas Cummins(University of Augsburg), Bin Liu(National Laboratory of Pattern Recognition, CASIA, Beijing), Haishuai Wang(Tianjin Normal University), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Björn Schuller(University of Augsburg / Imperial College London)

Thu-3-9-4 Improving detection of Alzheimer’s Disease using automatic speech recognition to identify high-quality segments for more robust feature extraction

Yilin Pan(University of Sheffield), Bahman Mirheidari(Department of Computer Science, University of Sheffield), Markus Reuber(Academic Neurology Unit, Royal Hallamshire Hospital), Annalena Venneri(Sheffield Institute for Translational Neuroscience, University of Sheffield), Daniel Blackburn(Sheffield Institute for Translational Neuroscience, University of Sheffield) and Heidi Christensen(University of Sheffield)

Thu-3-9-5 Classification of Manifest Huntington Disease using Vowel Distortion Measures

Amrit Romana(University of Michigan), John Bandon(University of Michigan), Noelle Carlozzi(University of Michigan), Angela Roberts(Northwestern University) and Emily Mower Provost(University of Michigan)

Thu-3-9-6 Parkinson's Disease Detection from Speech using Single Frequency Filtering Cepstral Coefficients

Sudarsana Reddy Kadiri(Aalto University), Rashmi Kethireddy(International Institute of Information Technology) and Paavo Alku(Aalto University)

Thu-3-9-7 Automatic Prediction of Speech Intelligibility based on X-vectors in the context of Head and Neck Cancer

Sebastião Quintas(RIT, Université de Toulouse, CNRS, Toulouse, France), Julie Mauclair(IRIT), Virginie Woisard(CHU Larrey) and Julien Pinquier(IRIT)

Thu-3-9-8 Spectral Moment and Duration of Burst of Plosives in Speech of Children with Hearing Impairment and Typically Developing Children - A Comparative Study

Ajish Kuriakose Abraham(All India Institute of Speech & Hearing), Pushpavathi M(All India Institute of Speech and Hearing), Sreedevi N(All India Institute of Speech and Hearing), Navya A(All India Institute of Speech and Hearing), Vikram C M(IIT Guwahati) and Mahadeva Prasanna S.R.(IIT Dharwad)

Thu-3-9-9 Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts

Matthew Perez(University of Michigan), Zakaria Aldeneh(University of Michigan) and Emily Mower Provost(University of Michigan)

Thu-3-9-10 Automatic Discrimination of Apraxia of Speech and Dysarthria using a Minimalistic Set of Handcrafted Features

Ina Kodrasi(Idiap Research Institute), Michaela Pernon(University of Geneva), Marina Laganaro(University of Geneva) and Herve Bourlard(Idiap Research Institute & EPFL)

ASR Neural Network Architectures II - Transformers   Video

Thu-3-10-1 Weak-Attention Suppression For Transformer Based Speech Recognition

Yangyang Shi(Facebook), Yongqiang Wang(Facebook), Chunyang Wu(Facebook), Christian Fuegen(Facebook), Frank Zhang(Facebook), Duc Le(Facebook), Ching-Feng Yeh(Facebook Inc.) and Mike Seltzer(Facebook)

Thu-3-10-2 Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition

Wenyong Huang(Huawei Noah’s Ark Lab), Wenchao Hu(Huawei Noah’s Ark Lab), Yu Ting Yeung(Huawei Noah’s Ark Lab) and Xiao Chen(Huawei Noah’s Ark Lab)

Thu-3-10-3 Improving Transformer-based Speech Recognition With Unsupervised Pre-training and Multi-task Semantic Knowledge Learning

Song Li(Xiamen University), Lin Li(Xiamen University), Qingyang Hong(Xiamen University) and Lingling Liu(Xiamen University)

Thu-3-10-4 Transformer-based Long-context End-to-end Speech Recognition

Takaaki Hori(Mitsubishi Electric Research Laboratories), Niko Moritz(MERL), Chiori Hori(MERL) and Jonathan Le Roux(Mitsubishi Electric Research Laboratories)

Thu-3-10-5 Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR

Xinyuan Zhou(Shanghai Normal University), Grandee Lee(National University of Singapore), Emre Yilmaz(National University of Singapore), Yanhua Long(Shanghai Normal University), Jiaen Liang(Unisound AI Technology Co., Ltd.) and Haizhou Li(National University of Singapore)

Thu-3-10-6 Universal Speech Transformer

Yingzhu Zhao(Nanyang Technological University), Chongjia Ni(I2R), Cheung-Chi LEUNG(Alibaba Group), Shafiq Joty(Nanyang Technological University; Salesforce AI Research), Eng Siong Chng(Nanyang Technological University) and Bin Ma(Alibaba Inc.)

Thu-3-10-7 Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

Zhengkun Tian(Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Ye Bai(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Shuai Zhang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)

Thu-3-10-8 Cross Attention with Monotonic Alignment for Speech Transformer

Yingzhu Zhao(Nanyang Technological University), Chongjia Ni(I2R), Cheung-Chi LEUNG(Alibaba Group), Shafiq Joty(Nanyang Technological University; Salesforce AI Research), Eng Siong Chng(Nanyang Technological University) and Bin Ma(Alibaba Inc.)

Thu-3-10-9 Conformer: Convolution-augmented Transformer for Speech Recognition

Anmol Gulati(Google Brain), James Qin(google), Chung-Cheng Chiu(Google), Niki Parmar(Google), Yu Zhang(Google Brain), Jiahui Yu(Google), Wei Han(Google), Shibo Wang(Google Research), Zhengdong Zhang(Google Brain), Yonghui Wu(Google Brain) and Ruoming Pang(Google Inc.)

Thu-3-10-10 Exploring Transformers for Large-Scale Speech Recognition

Liang Lu(Microsoft), Changliang Liu(Microsoft), Jinyu Li(Microsoft) and Yifan Gong(Microsoft Corp)

Spatial Audio   Video

Thu-3-11-1 Sparseness-Aware DOA Estimation with Majorization Minimization

Masahito Togami(Line Corporation) and Robin Scheibler(LINE)

Thu-3-11-2 Spatial Resolution of Early Reflection for Speech and White Noise

Xiaoli Zhong(South China University of Technology), Hao Song(School of Management, Guangdong University of Technology) and Xuejie Liu(School of Physics and Telecommunication Engineering, South China Normal University)

Thu-3-11-3 Effect of Microphone Position Measurement Error on RIR and its Impact on Speech Intelligibility and Quality

Aditya Raikar(TCS Innovation Labs), Karan Nathwani(IIT Jammu), Ashish Panda(Innovation Labs, Tata Consultancy Services) and Sunil Kumar Kopparapu(TCS Research and Innovation - Mumbai)

Thu-3-11-4 Online Blind Reverberation Time Estimation Using CRNNs

Shuwen Deng(Friedrich-Alexander University Erlangen), Wolfgang Mack(International Audio Laboratories Erlangen) and Emanuël Habets(International Audio Laboratories Erlangen)

Thu-3-11-5 Single-Channel Blind Direct-to-Reverberation Ratio Estimation Using Masking

Wolfgang Mack(International Audio Laboratories Erlangen), Shuwen Deng(Friedrich-Alexander University Erlangen) and Emanuël Habets(International Audio Laboratories Erlangen)

Thu-3-11-6 The importance of time-frequency averaging for binaural speaker localization in reverberant environments

Hanan Beit-On(Ben-Gurion University), Vladimir Tourbabin(Facebook) and Boaz Rafaely(Ben-Gurion University of the Negev)

Thu-3-11-7 Acoustic Signal Enhancement Using Relative Harmonic Coefficients: Spherical Harmonics Domain Approach

Yonggang Hu(Australian National University), Prasanga Samarasinghe(Australian National University) and Thushara Abhayapala(Australian National University)

Thu-3-11-8 INSTANTANEOUS TIME DELAY ESTIMATION OF BROADBAND SIGNALS

Narayana Murthy BHVS(Research Centre Imarat), Satyanarayana J.V.(Research Centre Imarat), Nivedita Chennupati(International Institute of Information Technology Hyderabad) and Bayya Yegnanarayana(International Institute of Information Technology at Hyderabad)

Thu-3-11-9 U-net based direct-path dominance test for robust direction-of-arrival estimation

Hao Wang(Key Laboratory of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing), Kai Chen(Key Laboratory of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing) and Jing Lu(Key Laboratory of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing)

Thu-3-11-10 Sound Event Localization and Detection Based on Multiple DOA Beamforming and Multi-task Learning

Wei Xue(JD AI Research), Ying Tong(JD AI Research), Chao Zhang(JD AI Research), Guohong Ding(JD AI Research), Xiaodong He(JD AI Research) and Bowen Zhou(JD AI Research)

Thursday 23:00-24:00(GMT+8), October 29

Closing Session  

Student Information

Student Events

Travel Grants