谷歌官方文档是: https://ai.google.dev/gemini-api/docs/speech-generation
但是很尴尬,他没有java sdk的示例,而且java sdk的example里也没有对应的例子,https://github.com/googleapis/java-genai/tree/main/examples . 这两个和文档不一致examples/src/main/java/com/google/genai/examples/InteractionMultimodalResponseAudio.java , examples/src/main/java/com/google/genai/examples/InteractionMultimodalResponseAudioWithGenerateContent.java .
但其实只需要用他统一的generateContent方法就可以了.
调用方式其实和文档里的python一样,只要知道对应的类即可.
单人示例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 var client = Client.builder() .apiKey(System.getenv("GEMINI_API_KEY" )) .build(); var config = GenerateContentConfig.builder() .responseModalities(List.of("AUDIO" )) .speechConfig(SpeechConfig.builder() .voiceConfig(VoiceConfig.builder() .prebuiltVoiceConfig(PrebuiltVoiceConfig.builder() .voiceName("Kore" ) .build()) .build()) .build()) .build(); var response = client.models.generateContent( "gemini-3.1-flash-tts-preview" , "Say cheerfully: Have a wonderful day!" , config ); byte [] bytes = Objects.requireNonNull(response.parts()).stream() .map(part -> part.inlineData().orElse(null )) .filter(blob -> blob != null && blob.data().isPresent()) .map(blob -> blob.data().orElseThrow()) .findFirst() .orElseThrow(() -> new IllegalStateException ("No audio bytes returned from GenAI TTS response" ));
多人示例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 var client = Client.builder() .apiKey(System.getenv("GEMINI_API_KEY" )) .build(); var transcriptResp = client.models.generateContent( "gemini-3-flash-preview" , """ Generate a short transcript around 100 words that reads like a podcast by excited herpetologists. The hosts names are Dr. Anya and Liam. Use 'Dr. Anya: ...' and 'Liam: ...' format. """ , null ); String transcript = Objects.requireNonNull(transcriptResp.parts()).get(0 ).text().orElseThrow();var config = GenerateContentConfig.builder() .responseModalities(List.of("AUDIO" )) .speechConfig(SpeechConfig.builder() .multiSpeakerVoiceConfig(MultiSpeakerVoiceConfig.builder() .speakerVoiceConfigs( SpeakerVoiceConfig.builder() .speaker("Dr. Anya" ) .voiceConfig(VoiceConfig.builder() .prebuiltVoiceConfig(PrebuiltVoiceConfig.builder() .voiceName("Kore" ).build()) .build()) .build(), SpeakerVoiceConfig.builder() .speaker("Liam" ) .voiceConfig(VoiceConfig.builder() .prebuiltVoiceConfig(PrebuiltVoiceConfig.builder() .voiceName("Puck" ).build()) .build()) .build() ) .build()) .build()) .build(); var audioResp = client.models.generateContent( "gemini-3.1-flash-tts-preview" , transcript, config ); byte [] bytes = Objects.requireNonNull(audioResp.parts()).stream() .map(part -> part.inlineData().orElse(null )) .filter(blob -> blob != null && blob.data().isPresent()) .map(blob -> blob.data().orElseThrow()) .findFirst() .orElseThrow(() -> new IllegalStateException ("No audio bytes returned from GenAI TTS response" ));
Step1不用管,就是用来生成会话,自己输入即可. Step2是生成音频,多人需要设置MultiSpeakerVoiceConfig,里面设置多个SpeakerVoiceConfig,每个SpeakerVoiceConfig里面设置一个speaker(这个speaker要和prompt里约定的一样)和voiceConfig,voiceConfig里面设置一个voiceName.
导出wav音频 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 private static final int SAMPLE_RATE = 24_000 ;private static final short CHANNELS = 1 ;private static final short BITS_PER_SAMPLE = 16 ;private static void writeWaveFile (Path output, byte [] pcmData) throws IOException { Files.createDirectories(output.getParent()); int byteRate = SAMPLE_RATE * CHANNELS * BITS_PER_SAMPLE / 8 ; short blockAlign = (short ) (CHANNELS * BITS_PER_SAMPLE / 8 ); int dataSize = pcmData.length; ByteBuffer header = ByteBuffer.allocate(44 ).order(ByteOrder.LITTLE_ENDIAN); header.put("RIFF" .getBytes(StandardCharsets.US_ASCII)); header.putInt(36 + dataSize); header.put("WAVE" .getBytes(StandardCharsets.US_ASCII)); header.put("fmt " .getBytes(StandardCharsets.US_ASCII)); header.putInt(16 ); header.putShort((short ) 1 ); header.putShort(CHANNELS); header.putInt(SAMPLE_RATE); header.putInt(byteRate); header.putShort(blockAlign); header.putShort(BITS_PER_SAMPLE); header.put("data" .getBytes(StandardCharsets.US_ASCII)); header.putInt(dataSize); try (OutputStream outputStream = Files.newOutputStream(output)) { outputStream.write(header.array()); outputStream.write(pcmData); } }
这样就可以了. 其实嫌麻烦直接用REST最直接,毕竟和SDK无关,参数也直观一点.