Pseudo forced alignment using a Praat script

2023-09-21
2 min read

It is very common for phoneticians to ask speakers to read aloud word lists, especially in the field. However, to find a word in a long sound file is very painful and a better solution is to split the long sound file into short files with meaningful file names. I wrote a Praat script for that purpose and to save some time, I also use the script to fill in the transcriptions to the TextGrid file. This post is a documentation of the script, which can be found in my github.

Prerequisites

  1. A long sound file
  2. A metadata file of the wordlist which contains at least four columns
    • word: the words or gloss, for your own reference
    • label: the transcription of the words, to fill in the TextGrid file
    • ID: the file name of each word, to rename the short sound files
    • number: the number indicating the order of the word in the word list
  3. splitLongSound.praat: the Praat script to split the long sound file

Procedures

  1. Open the Praat script (splitLongSound.praat)
  2. Input the required information and run the script. The required information include
    • the directory of the long sound file
    • the filename of the long sound file, and the extension (i.e., .wav) is not needed
    • the directory to store the short sound files
    • the directory of the matedata file
    • ID of the speaker: it will be prefixed to the short file names
    • vowel pattern: a regular expression of the pattern of vowel parts of the words
  3. Listen to the recording and add the number of the word
    • for example, when you hear the first word, add the number 1 (because it's the first word in the list) after “sound” in the TextGrid
    • The number column in the metadata file can be used here
    • Note that “sound” cannot be deleted
    • No need to modify any other texts and intervals in the TextGrid file
    • But when the interval before or after the sound is too long and you want to trim it a little bit shorter, you can add a boundary before or after the sound
  4. Proceed to all the words and complete the sound file
  5. Click done and now all short sound files are saved

You can also listen to some of the words and come back next time. The textgrid file is saved and can be read and revised in the future. You can click next time (no short sound file is saved) or done (only those with a number in the TextGrid will be saved)

There are two tiers in the short TextGrid files, “syllable” and “segment”. This script is especially useful for Chinese and other monosyllabic languages, with simple phonotactics. On the “syllable” tier, the transcriptions of the syllable will be automatically filled in, and on the “segment” tier, the onsets and vowels will be automatically segmented with corresponding transcriptions identified using the vowelPattern regular expression.

Also note that, if the long sound file has two channels (e.g., audio and EGG recordings), the second channel is also extracted separately and named with a suffix (-EGG) in the file. You can change this on line 123.