This is a grouped Moodle course. It consists of several separate courses that share learning materials, assignments, tests etc. Below you can see information about the individual courses that make up this Moodle course.
Speech Processing (Main course) B2M31ZRE
Credits 6
Semesters Summer
Completion Assessment + Examination
Language of teaching Czech
Extent of teaching 2P+2C
Annotation
The subject is devoted to basis of speech processing addressed to students of master program. Discussed speech technology is currently applied in many systems in different fields (e.g. information dialogue systems, voice controlled devices, dictation systems or transcription of audio-video recordings, support for language teaching, etc.). Students will learn basic algorithms for speech analysis (spectral analysis, LPC, cepstral analysis, pitch, formants, etc.), principles of speech recognition (GMM-HMM, ANN-HMM systems, small and large vocabulary recognizers), speaker recognition (based on VQ and GMM), speech synthesis or speech enhancement. Further information can be found at http://noel.feld.cvut.cz/vyu/ae2m31zre. Pro zapsané studenty jsou detailní informace na výukovém portálu Moodle FEL.
Study targets
The goals of the subject is to introduce used speech technology in the most important multimedia applications. Students should manage the knowledge as basic characteristics of speech signal, speech enhancement, speech recognition, speech synthesis, audio-visual speech processing, etc. Students will practice basic tasks of speech processing in MATLAB environment and also other publicly available tools for speech analysis will be used. 
Course outlines
1. Introduction - speech production and perception model, basic characteristics (phonetic and articulatory)
2. Spectral characteristics of speech signal (DFT and LPC spectrum)
3. Cepstral reprezentation of speech. Recognition features. Voice Activity Detection.
4. Speech enahncement (additive and convolution noise, one-channel and multi-channel systems)
5. Basic classification approaches and techniques (GMM, HMM, VQ, ANN, DNN)
6. Speaker verification and identification. Language recognition.
7. Small and large vocabulary speech recognition (DTW, GMM-HMM, LVCSR, HTK and KALDI tools).
8. Modern LVCSR systems (DNN-HMM). Adaptation techniques. Advanced speech features.
9. Speech synthesis - basic principles (concatenative and formant synthesis, PSOLA)
10. Audio-visual speech recognition
11. Speech coding.
12. Hearing aids and cochlear implants (anatomy and hearing model, speech processing)
13. Multimedia systems with voice input (dialog systems, logopaedy, language teaching)
14. Databases for speech technology systems. Reserve.
Exercises outlines
1. Introduction: speech signal, tools for analysis, sources of speech signals
2. Basic time-domain and spectral characteristics
3. Fundamental frequency (pitch) estimaton
4. LPC spectrum and formant estimation
5. Cepstrum and cepstral distance: voice activity detection.
6. Basic classification techniques (GMM, VQ, HMM): vowel classification
7. Speaker verification based on VQ
8. Speaker identification based on GMM
9. DTW based recognition: simple recognizer of particular words
10. HMM based recognition: basic tasks and demonstration of HMM modelling
11. Suppression of additive noise in speech signal
12. Convolutory noise suppression
13. Speech synthesis: implementation of formant synthesis, demonstration of available tools
14. Reserve. Credits
Literature
[1] Huang, X. - Acero, A. - Hon, H.-W.: Spoken Language Processing. Prentice Hall 2001.
Requirements
Bases of digital signal processing are supposed as preliminary knowledge.
Speech processing A2M31ZRE
Credits 6
Semesters Winter
Completion Assessment + Examination
Language of teaching Czech
Extent of teaching 2P+2C
Annotation
The subject is devoted to basis of speech processing addressed to students of master program with special focus on multimedia applications. Discussed speech technology is currently applied in many systems in different fields (e.g. information dialogue systems, voice controlled devices, dictation systems or transcription of audio-video recordings, support for language teaching, etc.). Further information can be found at http://noel.feld.cvut.cz/vyu/a2m31zre . Detailed information for registered students can be found at teaching portal http://moodle.kme.feld.cvut.cz .
Study targets
The goals of the subject is to introduce used speech technology in the most important multimedia applications. Students should manage the knowledge as basic characteristics of speech signal, speech enhancement, speech recognition, speech synthesis, audio-visual speech processing, etc. Students will practice basic tasks of speech processing in MATLAB environment and also other publicly available tools for speech analysis will be used. As a homework, students will elaborate semester project which will be presented at the exercise according to planned schedule.
Course outlines
1. Introduction - speech signal (digital form), speech production model
2. Basic characteristics of speech signal, phonetic and articulatory aspects
3. Spectral characteristics of speech signal (DFT and LPC spectrum)
4. Noise suppression in speech signal (additive and convolution noise, one-channel, multi-channel)
5. Hearing aids and cochlear implants (anatomy and hearing model, speech processing)
6. Principles of speech recognition, basic tasks ad applications
7. Feature extraction for speech recognition
8. Small vocabulary speech recognition based on DTW and HMM (HTK)
9. Dictation and transcription systems (large vocabulary speech recognition)
10. Speaker verification and identification.
11. Speech synthesis - basic principles (concatenative and formant synthesis, PSOLA)
12. Audio-visual speech recognition
13. Multimedia systems with voice input (dialog systems, logopaedy, language teaching)
14. Language recognition. Reserve.
Exercises outlines
1. Introduction: speech signal, tools for analysis, sources of speech signals
2. Basic time-domain characteristics: energy, intensity, zero-crossing, fundamental frequency
3. Spectral characteristics: short-time DFT and LPC spectrum, spectrogram
4. Suppression of additive noise in speech signal
5. Convolutory noise suppression
6. Speech processing for hearing aids and cochlear implants
7. Cepstrum and cepstral distance: voice activity detection, features for recognition
8. DTW based recognition: simple recognizer of particular words
9. HMM based recognition: basic tasks and demonstration of HMM modelling
10. Speaker verification based on GMM
11. Speech synthesis: implementation of formant synthesis, demonstration of available tools
12. Semester work presentations
13. Semester work presentations
14. Reserve. Credits
Literature
[1] Huang, X. - Acero, A. - Hon, H.-W.: Spoken Language Processing. Prentice Hall 2001.
Requirements
Bases of digital signal processing are supposed as preliminary knowledge.
Speech technology in telecommunications AD2M31ZRE
Credits 6
Semesters Winter
Completion Assessment + Examination
Language of teaching Czech
Extent of teaching 14KP+6KC
Annotation
The subject is devoted to basis of speech processing addressed to students of master program with special focus on multimedia applications. Discussed speech technology is currently applied in many systems in different fields (e.g. information dialogue systems, voice controlled devices, dictation systems or transcription of audio-video recordings, support for language teaching, etc.). Further information can be found at http://noel.feld.cvut.cz/vyu/ad2m31zre . Detailed information for registered students can be found at teaching portal http://moodle.kme.feld.cvut.cz .
Study targets
The goals of the subject is to introduce used speech technology in the most important multimedia applications. Students should manage the knowledge as basic characteristics of speech signal, speech enhancement, speech recognition, speech synthesis, audio-visual speech processing, etc. Students will practice basic tasks of speech processing in MATLAB environment and also other publicly available tools for speech analysis will be used. As a homework, students will elaborate semester project which will be presented at the exercise according to planned schedule.
Course outlines
1. Introduction - speech signal (digital form), speech production model
2. Basic characteristics of speech signal, phonetic and articulatory aspects
3. Spectral characteristics of speech signal (DFT and LPC spectrum)
4. Noise suppression in speech signal (additive and convolution noise, one-channel, multi-channel)
5. Hearing aids and cochlear implants (anatomy and hearing model, speech processing)
6. Principles of speech recognition, basic tasks ad applications
7. Feature extraction for speech recognition
8. Small vocabulary speech recognition based on DTW and HMM (HTK)
9. Dictation and transcription systems (large vocabulary speech recognition)
10. Speaker verification and identification.
11. Speech synthesis - basic principles (concatenative and formant synthesis, PSOLA)
12. Audio-visual speech recognition
13. Multimedia systems with voice input (dialog systems, logopaedy, language teaching)
14. Language recognition. Reserve.
Exercises outlines
1. Introduction: speech signal, tools for analysis, sources of speech signals
2. Basic time-domain characteristics: energy, intensity, zero-crossing, fundamental frequency
3. Spectral characteristics: short-time DFT and LPC spectrum, spectrogram
4. Suppression of additive noise in speech signal
5. Convolutory noise suppression
6. Speech processing for hearing aids and cochlear implants
7. Cepstrum and cepstral distance: voice activity detection, features for recognition
8. DTW based recognition: simple recognizer of particular words
9. HMM based recognition: basic tasks and demonstration of HMM modelling
10. Speaker verification based on GMM
11. Speech synthesis: implementation of formant synthesis, demonstration of available tools
12. Semester work presentations
13. Semester work presentations
14. Reserve. Credits
Literature
[1] Huang, X. - Acero, A. - Hon, H.-W.: Spoken Language Processing. Prentice Hall 2001.
Requirements
Bases of digital signal processing are supposed as preliminary knowledge.
Responsible for the data validity: Study Information System (KOS)