2026-05-09

谷歌GenAI Java SDK tts示例 (gemini-3.1-flash-tts-preview)

谷歌官方文档是: https://ai.google.dev/gemini-api/docs/speech-generation

但是很尴尬,他没有java sdk的示例,而且java sdk的example里也没有对应的例子,https://github.com/googleapis/java-genai/tree/main/examples.
这两个和文档不一致examples/src/main/java/com/google/genai/examples/InteractionMultimodalResponseAudio.java, examples/src/main/java/com/google/genai/examples/InteractionMultimodalResponseAudioWithGenerateContent.java.

但其实只需要用他统一的generateContent方法就可以了.

调用方式其实和文档里的python一样,只要知道对应的类即可.

单人示例

var client = Client.builder()
    .apiKey(System.getenv("GEMINI_API_KEY"))
    .build();

// 构造配置：responseModalities=["AUDIO"] + SpeechConfig + VoiceConfig + PrebuiltVoiceConfig
var config = GenerateContentConfig.builder()
    .responseModalities(List.of("AUDIO"))
    .speechConfig(SpeechConfig.builder()
        .voiceConfig(VoiceConfig.builder()
            .prebuiltVoiceConfig(PrebuiltVoiceConfig.builder()
                .voiceName("Kore")     // 👈 选择语音
                .build())
            .build())
        .build())
    .build();

// 统一入口 generateContent
var response = client.models.generateContent(
    "gemini-3.1-flash-tts-preview",   // 👈 TTS 模型
    "Say cheerfully: Have a wonderful day!",
    config
);

byte[] bytes = Objects.requireNonNull(response.parts()).stream()
        .map(part -> part.inlineData().orElse(null))
        .filter(blob -> blob != null && blob.data().isPresent())
        .map(blob -> blob.data().orElseThrow())
        .findFirst()
        .orElseThrow(() -> new IllegalStateException("No audio bytes returned from GenAI TTS response"));

多人示例

var client = Client.builder()
    .apiKey(System.getenv("GEMINI_API_KEY"))
    .build();

// Step 1: 用文本模型生成对话剧本
var transcriptResp = client.models.generateContent(
        "gemini-3-flash-preview",
        """
        Generate a short transcript around 100 words that reads
        like a podcast by excited herpetologists.
        The hosts names are Dr. Anya and Liam. Use 'Dr. Anya: ...' and 'Liam: ...' format.
        """,
        null
);
String transcript = Objects.requireNonNull(transcriptResp.parts()).get(0).text().orElseThrow();

// Step 2: 用 TTS 模型合成多人对话
var config = GenerateContentConfig.builder()
        .responseModalities(List.of("AUDIO"))
        .speechConfig(SpeechConfig.builder()
                .multiSpeakerVoiceConfig(MultiSpeakerVoiceConfig.builder()
                        .speakerVoiceConfigs(
                                SpeakerVoiceConfig.builder()
                                        .speaker("Dr. Anya")
                                        .voiceConfig(VoiceConfig.builder()
                                                .prebuiltVoiceConfig(PrebuiltVoiceConfig.builder()
                                                        .voiceName("Kore").build())
                                                .build())
                                        .build(),
                                SpeakerVoiceConfig.builder()
                                        .speaker("Liam")
                                        .voiceConfig(VoiceConfig.builder()
                                                .prebuiltVoiceConfig(PrebuiltVoiceConfig.builder()
                                                        .voiceName("Puck").build())
                                                .build())
                                        .build()
                        )
                        .build())
                .build())
        .build();

var audioResp = client.models.generateContent(
        "gemini-3.1-flash-tts-preview",
        transcript,
        config
);

byte[] bytes = Objects.requireNonNull(audioResp.parts()).stream()
        .map(part -> part.inlineData().orElse(null))
        .filter(blob -> blob != null && blob.data().isPresent())
        .map(blob -> blob.data().orElseThrow())
        .findFirst()
        .orElseThrow(() -> new IllegalStateException("No audio bytes returned from GenAI TTS response"));

Step1不用管,就是用来生成会话,自己输入即可.
Step2是生成音频,多人需要设置MultiSpeakerVoiceConfig,里面设置多个SpeakerVoiceConfig,每个SpeakerVoiceConfig里面设置一个speaker(这个speaker要和prompt里约定的一样)和voiceConfig,voiceConfig里面设置一个voiceName.

导出wav音频

private static final int SAMPLE_RATE = 24_000;
private static final short CHANNELS = 1;
private static final short BITS_PER_SAMPLE = 16;

private static void writeWaveFile(Path output, byte[] pcmData) throws IOException {
    Files.createDirectories(output.getParent());

    int byteRate = SAMPLE_RATE * CHANNELS * BITS_PER_SAMPLE / 8;
    short blockAlign = (short) (CHANNELS * BITS_PER_SAMPLE / 8);
    int dataSize = pcmData.length;

    ByteBuffer header = ByteBuffer.allocate(44).order(ByteOrder.LITTLE_ENDIAN);
    header.put("RIFF".getBytes(StandardCharsets.US_ASCII));
    header.putInt(36 + dataSize);
    header.put("WAVE".getBytes(StandardCharsets.US_ASCII));
    header.put("fmt ".getBytes(StandardCharsets.US_ASCII));
    header.putInt(16);
    header.putShort((short) 1);
    header.putShort(CHANNELS);
    header.putInt(SAMPLE_RATE);
    header.putInt(byteRate);
    header.putShort(blockAlign);
    header.putShort(BITS_PER_SAMPLE);
    header.put("data".getBytes(StandardCharsets.US_ASCII));
    header.putInt(dataSize);

    try (OutputStream outputStream = Files.newOutputStream(output)) {
        outputStream.write(header.array());
        outputStream.write(pcmData);
    }
}

这样就可以了.
其实嫌麻烦直接用REST最直接,毕竟和SDK无关,参数也直观一点.

小麦的杂货铺

日常随笔~

谷歌GenAI Java SDK tts示例 (gemini-3.1-flash-tts-preview)

单人示例

多人示例

导出wav音频