By John Kane, Christer Gobl (auth.), Thomas Drugman, Thierry Dutoit (eds.)

This booklet constitutes the complaints of the sixth overseas convention on Nonlinear Speech Processing, NOLISP 2013, held in Mons, Belgium, in June 2013. The 27 refereed papers incorporated during this quantity have been rigorously reviewed and chosen from 34 submissions. The paper are geared up in topical sections on speech and audio research; speech synthesis; speech-based biomedical functions; automated speech acceptance; and speech enhancement.

Example text

Acoustic theory of speech production. Walter de Gruyter (1970) 7. : Principles of voice production. Prentice Hall, Englewood Cliffs (1994) 8. : Statistical and Adaptive Signal Processing. Artech House (2005) 9. : Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication 48, 1243–1261 (2006) 10. : Albayzin Speech Database: Design of the Phonetic Corpus. In: Proc. Eurospeech 1993, vol. 1, pp. 653–656 (1993) 11. : Robust text-independent speaker identification using Gaussian mixture speaker models.

5. a) The beginning of a vowel. b) Its multi-scale product. Fig. 6. ACMP of a vowel beginning. a) Autocorrelation compression of MP with c=1. b) Autocorrelation compression of MP with c=2. c) Autocorrelation compression of MP with c=3. d) Autocorrelation functions multiplication. Figure 6 illustrates the efficacy of our approach for the fundamental frequency determination during a vowel onset. While the experimental results show that the other state of the art methods in literature give an F0 equals to zero at the beginning of vowel at this voiced region.

High-resolution ocean dynamics from microcanonical formulations in non linear complex signal analysis. In: AGU Fall Meeting. es Abstract. Gender detection from running speech is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here discards f0 as a valid feature because its estimation is complicate, or even impossible in unvoiced fragments, and its relevance in emotional speech or in strongly prosodic speech is not reliable.

