FunASR
haystack_integrations.components.audio.funasr.transcriber
FunASRTranscriber
Transcribes audio files to Documents using FunASR.
FunASR is an open-source speech recognition toolkit from Alibaba DAMO Academy. It supports 50+ languages, speaker diarization, and timestamp extraction, and runs entirely locally — no API key required.
Models are downloaded from ModelScope on first use and cached in ~/.cache/modelscope.
Usage Example:
from haystack_integrations.components.audio.funasr import FunASRTranscriber
transcriber = FunASRTranscriber()
result = transcriber.run(sources=["speech.wav", "interview.mp3"])
documents = result["documents"]
Speaker diarization and punctuation:
from haystack.utils import ComponentDevice
transcriber = FunASRTranscriber(
model="paraformer-zh",
vad_model="fsmn-vad",
punc_model="ct-punc",
spk_model="cam++",
device=ComponentDevice.from_str("cuda"),
)
SenseVoice with inverse text normalisation:
transcriber = FunASRTranscriber(
model="iic/SenseVoiceSmall",
generation_kwargs={"use_itn": True, "merge_vad": True, "language": "auto"},
)
init
__init__(
*,
model: str = "iic/SenseVoiceSmall",
vad_model: str | None = "fsmn-vad",
punc_model: str | None = "ct-punc",
spk_model: str | None = None,
device: ComponentDevice | None = None,
batch_size_s: int = 300,
store_full_path: bool = False,
generation_kwargs: dict[str, Any] | None = None
) -> None
Create a FunASRTranscriber component.
Parameters:
- model (
str) – FunASR model name or local path. Defaults to"iic/SenseVoiceSmall", a multilingual model supporting 50+ languages that is 5-10x faster than Whisper. Alternatives include"paraformer-zh"(Chinese) or"paraformer-en"(English). Browse available models at https://modelscope.github.io/FunASR/model-selection.html. - vad_model (
str | None) – Voice activity detection model used to split long audio into segments. Set toNoneto process the audio as a single stream. Browse available VAD models at https://www.modelscope.cn/models. - punc_model (
str | None) – Punctuation restoration model. Set toNoneto disable punctuation. Browse available punctuation models at https://www.modelscope.cn/models. - spk_model (
str | None) – Speaker diarization model (e.g."cam++"). When set, a"speakers"key is included in the Document metadata. Defaults toNone(diarization disabled). Browse available speaker diarization models at https://www.modelscope.cn/models. - device (
ComponentDevice | None) – The device to run inference on. IfNone, the default device is selected automatically. UseComponentDevice.from_str("cuda")for GPU inference. - batch_size_s (
int) – Batch size in seconds for VAD-segmented audio. Larger values improve throughput at the cost of memory. - store_full_path (
bool) – IfTrue, store the full audio file path in Document metadata. IfFalse(default), store only the file name. - generation_kwargs (
dict[str, Any] | None) – Extra keyword arguments forwarded toAutoModel.generate(). Use this for model-specific options such asuse_itn=Trueormerge_vad=Truefor SenseVoice, orhotword="..."for contextual recognition.
warm_up
Load the FunASR model into memory.
Models are downloaded from ModelScope on first call and cached locally. This method is idempotent — calling it multiple times is safe.
to_dict
Serialize the component to a dictionary.
Returns:
dict[str, Any]– Dictionary with serialized data.
from_dict
Deserialize the component from a dictionary.
Parameters:
- data (
dict[str, Any]) – Dictionary to deserialize from.
Returns:
FunASRTranscriber– Deserialized component.
run
run(
sources: list[str | Path | ByteStream],
meta: dict[str, Any] | list[dict[str, Any]] | None = None,
) -> dict[str, list[Document]]
Transcribe audio sources to Documents.
Parameters:
- sources (
list[str | Path | ByteStream]) – Audio file paths (strorPath) orByteStreamobjects. Supported formats: WAV, MP3, FLAC, OGG, M4A, AAC, and any format that FunASR's underlying audio backend (soundfile/ffmpeg) can decode. - meta (
dict[str, Any] | list[dict[str, Any]] | None) – Metadata to attach to the produced Documents. Pass a single dict to apply the same metadata to all Documents, or a list aligned withsources.
Returns:
dict[str, list[Document]]– Dictionary with key"documents"— oneDocumentper source whosecontentholds the full transcript text.