Useful References
Miscellaneous
General Speech datasets
LibriSpeech ASR corpus LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.
The LJ Speech Dataset This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.
Emotional Speech Datasets
Arabic Natural Audio Dataset 1384 recording by multiple speakers; 3 emotions: angry, happy, surprised. Every chunk was then automatically divided into 1 sec speech units forming final corpus composed of 1384 records.
Berlin Database of Emotional Speech 800 recording spoken by 10 actors (5 males and 5 females); 7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust.
EmoSynth-The Emotional Synthetic Audio Dataset EmoSynth is a dataset of 144 audio files which have been labelled by 40 listeners for their the perceived emotion, in regards to the dimensions of Valence and Arousal.
Emotional Voices Database various emotions with 5 voice actors (amused, angry, disgusted, neutral, sleepy)
Emov-DB - Recordings for 4 speakers- 2 males and 2 females; The emotional styles are neutral, sleepiness, anger, disgust and amused.
Estonian Emotional Speech Corpus (EEKK) - 26 text passage read by 10 speakers; 4 main emotions: joy, sadness, anger and neutral.
EmoFilm - A multilingual emotional speech corpus. The emotions are: anger, contempt, happiness, fear, and sadness. EmoFilm has been presented at Interspeech 2018:
EMOVO - 6 actors who played 14 sentences; 6 emotions: disgust, fear, anger, joy, surprise, sadness.
GEMEP corpus - 10 actors portraying 10 states; 12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness.
Toronto emotional speech set (TESS) - 2800 recording by 2 actresses; 7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral.
Speech Conversation Datasets
The CallHome English corpus of telephone speech was collected and transcribed by the Linguistic Data Consortium primarily in support of the project on Large Vocabulary Conversational Speech Recognition (LVCSR), sponsored by the U.S. Department of Defense.
This release of the CallHome English corpus consists of 120 unscripted telephone conversations between native speakers of English. The CD-ROM distribution contains the speech data only, along with essential documentation files and software for handling the compressed speech data. The transcripts and other text data and documentation are distributed separately (typically via electronic transmission from the LDC's ftp/web server), and will be subject to periodic updates.
The transcripts cover a contiguous 5 or 10 minute segment taken from a recorded conversation lasting up to 30 minutes. All speakers were aware that they were being recorded. They were given no guidelines concerning what they should talk about. Once a caller was recruited to participate, he/she was given a free choice of whom to call. Most participants called family members or close friends overseas. All calls originated in North America; 90 of the 120 calls were placed to various locations overseas, while the remaining 30 were placed within North America. The distribution of call destinations can be found in the file "spkrinfo.tbl". The transcripts are timestamped by speaker turn for alignment with the speech signal, and are provided in standard orthography.
The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. For a gentle introduction to the corpus, see the corpus overview. To access the data, follow the directions given there. Around two-thirds of the data has been elicited using a scenario in which the participants play different roles in a design team, taking a design project from kick-off to completion over the course of a day. The rest consists of naturally occurring meetings in a range of domains. Detailed information can be found in the documentation section.
The HCRC Map Task Corpus is a set of 128 dialogues that has been recorded, transcribed, and annotated for a wide range of behaviours, and has been released for research purposes. It was originally designed to elicit behaviours that answer specific research questions in linguistics. You can read more about the design here. Since the original material was released in 1992, the corpus design has been used not just for linguistics research, but also in teaching and by computational linguists for training machine classifiers.
Since HCRC continues to use the Corpus in our own research, we welcome contact with colleagues engaged in similar projects. For this reason we ask users to notify us at maptask@cogsci.ed.ac.uk as a matter of courtesy of the topic of their intended work with these materials.
Because the Map Task is available in a number of forms, we provide a brief history explaining what these are what they contain. Most people just want the most up-to-date version, which is in the format for the NITE XML Toolkit (see NXT-format XML Annotations (v2.1)). The simplest way to acquire the corpus in that format is from the main download page. To make things easier, the audio and maps are available from the same page.
Privacy & Security
Privacy reserve
Hongwei Li, Yi Yang, Tom H. Luan, Xiaohui Liang, Liang Zhou, Xuemin (Sherman) Shen. "Enabling Fine-Grained Multi-Keyword Search Supporting Classified Sub-Dictionariesover Encrypted Cloud Data"
WiFi sensing
WiFi sensing Surveys
Ma, Yongsen, Gang Zhou, and Shuangquan Wang. "WiFi sensing with channel state information: A survey." ACM Computing Surveys (CSUR) 52.3 (2019): 1-36.
Jiang, Hongbo, et al. "Smart home based on WiFi sensing: A survey." IEEE Access 6 (2018): 13317-13325.
Wang, Zhengjie, et al. "A survey on CSI-based human behavior recognition in through-the-wall scenario." IEEE Access 7 (2019): 78772-78793.
User identification/Human presence checking
Adib, Fadel, and Dina Katabi. "See through walls with WiFi!." Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM. 2013.
Liu, Hongbo, et al. "Practical user authentication leveraging channel state information (CSI)." Proceedings of the 9th ACM symposium on Information, computer and communications security. 2014.
Zhang, Jin, et al. "Wifi-id: Human identification using wifi signal." 2016 International Conference on Distributed Computing in Sensor Systems (DCOSS). IEEE, 2016.
Shi, Cong, et al. "Smart user authentication through actuation of daily activities leveraging WiFi-enabled IoT." Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing. 2017.
Lip movement detection
Meng, Yan, et al. "Wivo: Enhancing the security of voice control system via wireless signal in iot environment." Proceedings of the Eighteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing. 2018.
Meng, Yan, et al. "Liveness Detection for Voice User Interface via Wireless Signals in IoT Environment." JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015.