Advanced control over speech generation
Disabling normalization may reduce the stability of reading numbers, dates, and URLs. You'll need to handle these cases manually for best results.
Phoneme control allows you to specify exact pronunciations for words or characters. Currently, we support:
To use phoneme control, wrap the desired pronunciation in <|phoneme_start|> and <|phoneme_end|> tags. Each tag should contain a single word or character.
Standard: I am an engineer.
With control: I am an <|phoneme_start|>EH N JH AH N IH R<|phoneme_end|>.
标准: 我是一个工程师。
控制: 我是一个<|phoneme_start|>gong1<|phoneme_end|><|phoneme_start|>cheng2<|phoneme_end|><|phoneme_start|>shi1<|phoneme_end|>。
Paralanguage controls allow you to add natural speech elements and pauses to make the generated speech sound more human-like. There are two main types of controls:
You can use common pause words like "um", "uh", "嗯", "啊" to control the rhythm of the speech.
The following special effects can be added using parentheses:
Effect | Description | First Available | Stage |
---|---|---|---|
(break) | Short pause | V2 | Experimental |
(long-break) | Extended pause | V2 | Experimental |
(breath) | Breathing sound | V2 | Experimental |
(laugh) | Laughter sound | V2 | Experimental |
(cough) | Coughing sound | V2 | Experimental |
(lip-smacking) | Lip smacking sound | V2 | Experimental |
(sigh) | Sighing sound | V2 | Experimental |
The effects (laugh), (cough), (lip-smacking), and (sigh) are developing. You may need to repeat them multiple times for better results.
English Example:
Standard: I am an engineer.
With paralanguage: I am, um, an (break) engineer.
中文示例:
标准: 我是一名工程师。
添加副语言: 我,嗯,是一名(break)工程师。