Corpora

The Speech Lab is home to four corpora of recorded speech. If you are interested in access to the corpora, please contact the appropriate Principle Investigator (PI).


Cajun English Corpus

    PI:                         Katie Carmichael (katcarm@vt.edu)

    Date Collected:     2009

    Language:             English

    Location:              Lafourche Parish, Louisiana

    Speaker Info:        17 Speakers, all Cajun (white) males, aged 32-83. Monolingual English speakers, Semi Speakers of Cajun French (cf Dorian 1981), Bilingual French-English speakers, and L2 speakers of English (L1 Cajun French). Additional female speaker whose jokes are included in the corpus, but whose interview is not transcribed.

    Recording Info:     Participants wore Shure SM 10A headset microphones and were recorded by a ZOOM H4 portable digital recorder, a separate Crown Audio Sound Grabber II microphone was set up to record other speech and interaction.  

    Tasks:          Full interviews consisting of casual conversation lasted 30 to 90 minutes; 15 minute segments selected and transcribed in PRAAT and FAVE-aligned. Participants were also asked to tell Boudreaux and Thibodeaux jokes, which are told with exaggerated stereotypical Cajun English accents -- included in this corpus are recordings of 7 speakers telling 33 jokes. Only some are transcribed, but each joke is categorized with a general title and whether it is a Boudreaux and Thibodeaux joke (some more general jokes were also told).


Louisiana French Corpus

    PI:                          Katie Carmichael (katcarm@vt.edu)

    Date Collected:      2006-2008

    Language:              French

    Location:               Terrebonne and Lafourche Parishes, Louisiana

    Speaker Info:              28 Speakers, all Pointe-Au-Chien Indians, aged 28-73, split evenly according to gender across three speaker groups: 

  • 12 Older Fluent
  • 8 Younger Fluent
  • 8 Semi Speakers, or non-fluent speakers of French--common in situations of language death (cf Dorian 1981)

    Recording Info:     Participants and interviewer wore lavalier microphones. Interviews conducted with PI, a local "insider" interviewer, and participants. 

    Task Info:             Full interviews consisting of casual conversation lasted one to two hours; 15-45 minute segments selected and transcribed in MS Word. Participants were also asked to translate 50 short sentences in English to French.


New Orleans English: Chalmatian Corpus

    PI:                        Katie Carmichael (katcarm@vt.edu)

    Date Collected:    2012

    Language:            English

    Location:             St. Bernard Parish, Louisiana (towns surrounding Chalmette; some participants had moved to St. Tammany Parish on the Northshore of Lake Pontchartrain at the time of the interview, but all were native to St. Bernard)

    Speaker Info:       57 white, working class speakers, aged 18-85, 32 women and 25 men. Half of corpus had returned to St. Bernard following Hurricane Katrina, half had relocated.

    Recording Info:    Participants wore Shure SM 10A headset microphones and were recorded by a ZOOMH4 portable digital recorder, a seperate Crown Audio Sound Grabber II microphone was set up to record other speech and interaction.

    Task Info:            Full interviews consisting of casual conversation lasted one to four hours; 15-45 minute segments selected and transcribed in PRAAT and FAVE-aligned. At the end of the interview, metalinguistic commentary was elicited; some is transcribed in MS Word. After metalinguistic commentary, participants were asked to read a passage and word list, both of which are transcribed and FAVE-aligned. 


Transatlantic Corpus 

    PI:                      Abby Walker (ajwalker@vt.edu)

    Language:          American English and British English

    Location:            Columbus, Ohio & London, England

    Speakers:           97

    Speaker breakdown:     

  • 19 English Expatriates in Cbus (12 male & 9 female, aged 20-71)
  • 21 American Expatriates in LDN (4 male & 17 female, aged 23-74)
  • 13 English Fans of NFL teams (13 male, aged 23-41)
  • 16 American Fans of EPL teams (15 male & 1 female, aged 21-51)
  • 11 English Controls (7 male & 4 female, aged 18-48)
  • 14 American Controls (3 male & 11 female, aged 18-59)

    Data breakdown:          

  • Wordlist: 280 words/phrases  (blocked by theme) 
  • Listening in Noise: 128 sentences presented in SBE and SAE; participants' transcribed what they could hear.
  • Interviews: averaging 45 minutes (12 mins-92 mins)

    Equipment used: Shure 54 head worn microphones; ZOOM H4N portable digital recorder (44100 Hz, 16 bit)

    Recording quality: Varied. Some recordings occurred in noisy environments; occasional recording problems.