谷歌GenAI Java SDK tts示例 (gemini-3.1-flash-tts-preview)

谷歌官方文档是: https://ai.google.dev/gemini-api/docs/speech-generation

但是很尴尬,他没有java sdk的示例,而且java sdk的example里也没有对应的例子,https://github.com/googleapis/java-genai/tree/main/examples.
这两个和文档不一致examples/src/main/java/com/google/genai/examples/InteractionMultimodalResponseAudio.java, examples/src/main/java/com/google/genai/examples/InteractionMultimodalResponseAudioWithGenerateContent.java.

但其实只需要用他统一的generateContent方法就可以了.

调用方式其实和文档里的python一样,只要知道对应的类即可.

单人示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
var client = Client.builder()
.apiKey(System.getenv("GEMINI_API_KEY"))
.build();

// 构造配置:responseModalities=["AUDIO"] + SpeechConfig + VoiceConfig + PrebuiltVoiceConfig
var config = GenerateContentConfig.builder()
.responseModalities(List.of("AUDIO"))
.speechConfig(SpeechConfig.builder()
.voiceConfig(VoiceConfig.builder()
.prebuiltVoiceConfig(PrebuiltVoiceConfig.builder()
.voiceName("Kore") // 👈 选择语音
.build())
.build())
.build())
.build();

// 统一入口 generateContent
var response = client.models.generateContent(
"gemini-3.1-flash-tts-preview", // 👈 TTS 模型
"Say cheerfully: Have a wonderful day!",
config
);

byte[] bytes = Objects.requireNonNull(response.parts()).stream()
.map(part -> part.inlineData().orElse(null))
.filter(blob -> blob != null && blob.data().isPresent())
.map(blob -> blob.data().orElseThrow())
.findFirst()
.orElseThrow(() -> new IllegalStateException("No audio bytes returned from GenAI TTS response"));

多人示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
var client = Client.builder()
.apiKey(System.getenv("GEMINI_API_KEY"))
.build();

// Step 1: 用文本模型生成对话剧本
var transcriptResp = client.models.generateContent(
"gemini-3-flash-preview",
"""
Generate a short transcript around 100 words that reads
like a podcast by excited herpetologists.
The hosts names are Dr. Anya and Liam. Use 'Dr. Anya: ...' and 'Liam: ...' format.
""",
null
);
String transcript = Objects.requireNonNull(transcriptResp.parts()).get(0).text().orElseThrow();

// Step 2: 用 TTS 模型合成多人对话
var config = GenerateContentConfig.builder()
.responseModalities(List.of("AUDIO"))
.speechConfig(SpeechConfig.builder()
.multiSpeakerVoiceConfig(MultiSpeakerVoiceConfig.builder()
.speakerVoiceConfigs(
SpeakerVoiceConfig.builder()
.speaker("Dr. Anya")
.voiceConfig(VoiceConfig.builder()
.prebuiltVoiceConfig(PrebuiltVoiceConfig.builder()
.voiceName("Kore").build())
.build())
.build(),
SpeakerVoiceConfig.builder()
.speaker("Liam")
.voiceConfig(VoiceConfig.builder()
.prebuiltVoiceConfig(PrebuiltVoiceConfig.builder()
.voiceName("Puck").build())
.build())
.build()
)
.build())
.build())
.build();

var audioResp = client.models.generateContent(
"gemini-3.1-flash-tts-preview",
transcript,
config
);

byte[] bytes = Objects.requireNonNull(audioResp.parts()).stream()
.map(part -> part.inlineData().orElse(null))
.filter(blob -> blob != null && blob.data().isPresent())
.map(blob -> blob.data().orElseThrow())
.findFirst()
.orElseThrow(() -> new IllegalStateException("No audio bytes returned from GenAI TTS response"));

Step1不用管,就是用来生成会话,自己输入即可.
Step2是生成音频,多人需要设置MultiSpeakerVoiceConfig,里面设置多个SpeakerVoiceConfig,每个SpeakerVoiceConfig里面设置一个speaker(这个speaker要和prompt里约定的一样)和voiceConfig,voiceConfig里面设置一个voiceName.

导出wav音频

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
private static final int SAMPLE_RATE = 24_000;
private static final short CHANNELS = 1;
private static final short BITS_PER_SAMPLE = 16;

private static void writeWaveFile(Path output, byte[] pcmData) throws IOException {
Files.createDirectories(output.getParent());

int byteRate = SAMPLE_RATE * CHANNELS * BITS_PER_SAMPLE / 8;
short blockAlign = (short) (CHANNELS * BITS_PER_SAMPLE / 8);
int dataSize = pcmData.length;

ByteBuffer header = ByteBuffer.allocate(44).order(ByteOrder.LITTLE_ENDIAN);
header.put("RIFF".getBytes(StandardCharsets.US_ASCII));
header.putInt(36 + dataSize);
header.put("WAVE".getBytes(StandardCharsets.US_ASCII));
header.put("fmt ".getBytes(StandardCharsets.US_ASCII));
header.putInt(16);
header.putShort((short) 1);
header.putShort(CHANNELS);
header.putInt(SAMPLE_RATE);
header.putInt(byteRate);
header.putShort(blockAlign);
header.putShort(BITS_PER_SAMPLE);
header.put("data".getBytes(StandardCharsets.US_ASCII));
header.putInt(dataSize);

try (OutputStream outputStream = Files.newOutputStream(output)) {
outputStream.write(header.array());
outputStream.write(pcmData);
}
}

这样就可以了.
其实嫌麻烦直接用REST最直接,毕竟和SDK无关,参数也直观一点.