筛选：

VibeVoice 社区版 - PyTorch/CUDA 多 GPU 推理完整代码

微软删除官方 TTS 推理代码后，社区 fork 保留的 PyTorch 完整实现。支持单卡/多卡推理、语音克隆、Gradio Demo。

背景

微软于 2025 年9 月从官方仓库删除了 VibeVoice-TTS 的推理代码，但 Hugging Face 上的模型权重仍然可用。这个社区 fork 保留了完整的 PyTorch 实现，可以直接在 NVIDIA GPU 上跑通 TTS-1.5B 和 7B。

组件	说明
`modeling_vibevoice_inference.py`	完整的 `generate()` + `sample_speech_tokens()`
`modular_vibevoice_diffusion_head.py`	Diffusion head 实现
`modular_vibevoice_tokenizer.py`	声学/语义 tokenizer
`streamer.py`	流式音频输出

git clone https://github.com/peterw-github/VibeVoice.git
cd VibeVoice
pip install -e .

# 严格依赖版本
pip install transformers==4.51.3 accelerate==1.6.0

cd demo
python inference_from_file.py   --model vibevoice/VibeVoice-1.5B   --input_file input.txt   --output_dir ./output

python inference_from_file_multi_gpu.py   --model vibevoice/VibeVoice-1.5B   --input_file input.txt   --output_dir ./output

多 GPU 版本关键改动：

配置	预估显存
fp16 单卡	~8-10 GB
fp16 双卡	~5-6 GB / 卡

3300举报0

Xiao.Xi•16天前

被收录：

暂无评论