您现在的位置是：首页 > 人工智能AI > 正文

人工智能AI

如何使用DeepSeek构建中文语音聊天程序

1987web2025-02-23人工智能AI66

要使用本地部署的DeepSeek构建中文语音聊天程序，你需要整合语音识别（ASR）、大语言模型（LLM）和语音合成（TTS）三部分。以下是完整的流程和技术选型：1.技术栈

要使用本地部署的 DeepSeek 构建中文语音聊天程序，你需要整合语音识别（ASR）、大语言模型（LLM）和语音合成（TTS）三部分。以下是完整的流程和技术选型：

1. 技术栈

语音识别（ASR，Automatic Speech Recognition）：使用Whisper（开源）、DeepSeek-Speech 或者腾讯 API大语言模型（LLM）：本地部署DeepSeek LLM（7B/67B 模型, 大了也跑不动）语音合成（TTS，Text-To-Speech）：使用Edge-TTS（微软语音合成）或Bark前端：Gradio / Streamlit / FastAPI + WebSocket后端：Python + PyTorch + Transformers（用于 DeepSeek）

2. 环境准备

（1）安装 DeepSeek LLM

DeepSeek 提供了DeepSeek Chat（适用于对话）和DeepSeek Coder（适用于编程），你可以选择DeepSeek Chat 7B/67B进行部署。

A. 下载 DeepSeek 7B/67B

gitclone https://github.com/deepseek-ai/DeepSeek-LLM.gitcdDeepSeek-LLMpipinstall -r requirements.txt

B. 下载模型（示例 7B FP16）

from transformers import AutoModelForCausalLM, AutoTokenizer model_name ="deepseek-ai/deepseek-llm-7b"tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")

（建议使用 GPU 部署，推荐 24GB+ 显存）

（2）安装 Whisper 语音识别

pipinstall openai-whisper

使用 Whisper 进行语音转文本

importwhisper model = whisper.load_model("medium")medium, large 适合中文result = model.transcribe("audio.wav")print(result["text"])

（3）安装 Edge-TTS 语音合成

pipinstall edge-tts

文字转语音

importasyncioimportedge_ttsasyncdeftext_to_speech(text, output_path="output.mp3"):tts = edge_tts.Communicate(text,"zh-CN-XiaoxiaoNeural")awaittts.save(output_path) asyncio.run(text_to_speech("你好，我是你的 AI 语音助手。"))

Edge-TTS 支持多种中文语音角色，你可以选择不同的声音。

3. 构建完整的语音聊天流程

录音：用户说话，录音保存为 audio.wav语音识别（ASR）：Whisper 将 audio.wav 转换为文字自然语言处理（NLP）：DeepSeek 处理用户输入并生成回复语音合成（TTS）：Edge-TTS 读取 DeepSeek 的回复并转换为语音播放音频：播放合成后的语音 output.mp3importwhisperimporttorchimportasyncioimportedge_ttsfromtransformersimportAutoModelForCausalLM, AutoTokenizer初始化 Whisperasr_model = whisper.load_model("medium")初始化 DeepSeekmodel_name ="deepseek-ai/deepseek-llm-7b"tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) llm_model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)asyncdeftext_to_speech(text, output_path="output.mp3"):tts = edge_tts.Communicate(text,"zh-CN-XiaoxiaoNeural")awaittts.save(output_path)defchat_with_ai(user_input):input_ids = tokenizer.encode(user_input, return_tensors="pt").cuda() output_ids = llm_model.generate(input_ids, max_length=100) response = tokenizer.decode(output_ids[0], skip_special_tokens=True)returnresponsedefprocess_audio(audio_path):1. 语音识别result = asr_model.transcribe(audio_path) user_text = result["text"] print(f"用户:{user_text}")2. AI 生成回复ai_response = chat_with_ai(user_text) print(f"AI:{ai_response}")3. 语音合成asyncio.run(text_to_speech(ai_response,"response.mp3"))4. 播放音频importos os.system("mpg123 response.mp3")if__name__ =="__main__": process_audio("audio.wav")

4. 可视化界面

你可以使用Gradio搭建一个简单的 Web 界面：

pipinstall gradioimportgradioasgrdefchatbot(audio):audio_path ="user_audio.wav"audio.save(audio_path) process_audio(audio_path)return"response.mp3"iface = gr.Interface( fn=chatbot, inputs=gr.Audio(source="microphone", type="file"), outputs=gr.Audio(type="file"), live=True) iface.launch()

这样你就能在网页上录音，并与 DeepSeek 语音 AI 进行对话。

5. 总结

ASR（语音识别）：WhisperNLP（自然语言处理）：DeepSeek-LLMTTS（语音合成）：Edge-TTS界面：Gradio / Streamlit

这样，你就能用本地 DeepSeek + Whisper + Edge-TTS 构建一个完整的中文语音聊天程序了.