5.2.1 Voice Control
The voice control interface provides a complete set of voice interaction capabilities, including speech synthesis, speech recognition, audio noise reduction, audio playback, and volume control.
Key Features
Text-to-Speech (TTS)
Text-to-speech: Convert text into natural-sounding speech.
Multi-language support: Supports Chinese, English, and other languages.
Emotional speech: Supports different emotional styles for synthesis.
Priority management: Supports multi-level priority control.
Automatic Speech Recognition (ASR) (coming soon)
Real-time recognition: Supports real-time speech recognition.
Multi-language recognition: Supports Chinese, English, and other languages.
Audio stream processing: Supports real-time processing of audio streams.
Audio Processing
Real-time noise reduction: Supports real-time audio denoising.
Voice activity detection: Supports VAD (Voice Activity Detection).
Streaming: Supports streaming of denoised audio.
Audio Playback
Audio stream playback: Supports playback of audio data streams.
Priority control: Supports playback priority management.
Format support: Supports multiple audio formats.
Volume Control
Volume adjustment: Supports system volume adjustment.
Mute control: Supports mute / unmute.
Volume query: Supports querying the current volume.
Volume Control Services
Service Name |
Data Type |
Description |
|---|---|---|
|
Query volume |
|
|
Set volume |
|
|
Query mute status |
|
|
Set mute |
GetVolumeros2-srv @ hal/audio/srv/GetVolume.srv# Get Volume # Service: /aimdk_5Fmsgs/srv/GetVolume # Request CommonRequest request # Request header --- # Response CommonResponse reponse # Response header uint32 audio_volume # Current volume (0–100)
SetVolumeros2-srv @ hal/audio/srv/SetVolume.srv# Set Volume # Service: /aimdk_5Fmsgs/srv/SetVolume # Request CommonRequest request # Request header uint32 audio_volume # Target volume (0–100) --- # Response CommonResponse reponse # Response header uint32 audio_volume # Current volume (0–100)
GetMuteros2-srv @ hal/audio/srv/GetMute.srv# Get Mute Status # Service: /aimdk_5Fmsgs/srv/GetMute # Request CommonRequest request # Request header --- # Response CommonResponse reponse # Response header bool is_mute # Current mute state
SetMuteros2-srv @ hal/audio/srv/SetMute.srv# Set Mute # Service: /aimdk_5Fmsgs/srv/SetMute # Request CommonRequest request # Request header bool is_mute # Target mute state --- # Response CommonResponse reponse # Response header bool is_mute # Current mute state
Speech Synthesis Services
Service Name |
Data Type |
Description |
|---|---|---|
|
|
Text-to-speech playback |
PlayTtsros2-srv @ interaction/srv/PlayTts.srv# TTS Playback # Service: /aimdk_5Fmsgs/srv/PlayTts # Request CommonRequest header PlayTtsRequest tts_req # Embedded request msg --- # Response CommonResponse header PlayTtsResponse tts_resp # Embedded response msg
Where
PlayTtsRequestros2-msg @ interaction/msg/PlayTtsRequest.msg# Embedded request msg string text # Text content TtsPriorityLevel priority_level # Priority level (see TtsPriorityLevel below) uint32 priority_weight # Priority weight (0–99) string domain # Caller domain string trace_id # Request trace ID bool is_interrupted # Whether to interrupt broadcasts of the same priority (otherwise queued)
TtsPriorityLevelros2-msg @ interaction/msg/TtsPriorityLevel.msg# TTS priority level uint8 value # Priority value
Available
TtsPriorityLevelvalues:Level
Value
Description
Usage scenarios
Emergency safety layer (SAFETY_L10)
10
Highest priority
Safety alerts, emergency notifications
Warning layer (WARNING_L8)
8
High priority
Hazard alerts and warning messages
System notice layer (SYSTEM_L7)
7
Medium-high priority
System-level Notice
Interaction response layer (INTERACTION_L6)
6
Medium priority
User interaction and conversational responses
Mission execution layer (MISSION_L4)
4
Medium-low priority
Task execution and status broadcasts
Service layer (SERVICE_L2)
2
Low priority
Proactive services and reminders
Background service layer (BACKGROUND_L1)
1
Lowest priority
Background services and logging
Audio playback priority mechanism:
This priority system applies to both TTS playback (PlayTts) and audio file playback (PlayAudioFile).
Higher priority playback interrupts lower priority playback.
For the same priority level, behavior is determined by
priority_weightandis_interrupted.The playback queue would be reset when interrupted
The emergency safety level has the highest priority and cannot be interrupted by any other level.
PlayTtsResponseros2-msg @ interaction/msg/PlayTtsResponse.msg# Embedded response msg string text # Response text TtsPriorityLevel priority_level # Priority level uint32 priority_weight # Priority weight string domain # Caller domain string trace_id # Request trace ID bool is_success # Whether the request succeeded string error_message # Error message uint32 estimated_duration # Estimated duration (ms)
Audio File Playback Service
Call the PlayAudioFile service with the audio file path (file_path = parent directory, file_name = filename) and priority to trigger playback. A response where reponse.status.value == 1 indicates success. See examples: C++ / Python.
Service Name |
Data Type |
Description |
|---|---|---|
|
|
Play audio file |
PlayAudioFileros2-srv @ hal/audio/srv/PlayAudioFile.srv# Play audio file # Service: /aimdk_5Fmsgs/srv/PlayAudioFile # Request CommonRequest request # Request header AudioFile file # Audio file info (required) builtin_interfaces/Time play_stamps # Optional; scheduled play time, default: play immediately --- # Response CommonResponse reponse # Response header
AudioFileros2-msg @ hal/audio/msg/AudioFile.msgstring pkg_name # Required; identifies the caller string file_name # Required; file name string file_path # Required; parent directory path (uses system default if empty; must not end with the file name) AudioInfo info # Required for PCM, optional for WAV; audio format uint32 priority # Required; priority (1–10, default 6) uint32 priority_weight # Optional; (1–100) final priority = priority + priority_weight%
Notes:
Audio files must be PCM-encoded raw files (.pcm) or WAV files wrapping this PCM data (.wav). Other formats such as MP3 are not supported.
Audio must be 16 kHz sample rate, 16-bit, mono.
When using an absolute path, set
file_pathto the parent directory andfile_nameto the file name.Audio files must be stored on the interaction compute unit (PC3, 10.0.1.42), not the development compute unit (PC2).
The audio folder and all its parent directories must be readable by all users (a subdirectory under /var/tmp/ is recommended).
Audio Stream Playback
Provides raw audio stream playback support
Service Name |
Data Type |
Description |
|---|---|---|
|
Request audio playback focus |
|
|
Release audio playback focus |
Topic Name |
Data Type |
Description |
QoS |
Frequency |
|---|---|---|---|---|
|
Audio stream playback |
- |
Published by the user application |
|
|
Audio focus change events |
- |
Event-triggered; notifies when audio focus is preempted |
|
|
Audio playback state events |
- |
Event-triggered; notifies on audio playback state change |
RequestAudioFocus ros2-srv @ hal/audio/srv/RequestAudioFocus.srv
# Request audio playback focus # Service: /aimdk_5Fmsgs/srv/RequestAudioFocus # Request CommonRequest request # Request header FocusRequester focus_requester # Focus request info --- # Response CommonResponse reponse # Response header FocusResponse focus_response # Request result
FocusRequester ros2-msg @ hal/audio/msg/FocusRequester.msg
string pkg_name # Playback source identifier uint32 priority # Priority (1–10, default 6) uint32 priority_weight # Weight (optional); breaks ties within same priority level
FocusResponse ros2-msg @ hal/audio/msg/FocusResponse.msg
string pkg_name # Playback source identifier bool focus_gain # Focus grant result
AbandonAudioFocus ros2-srv @ hal/audio/srv/AbandonAudioFocus.srv
# Release audio playback focus # Service: /aimdk_5Fmsgs/srv/AbandonAudioFocus # Request CommonRequest request # Request header FocusRequester focus_requester # Focus request info --- # Response CommonResponse reponse # Response header FocusResponse focus_response # Request result
FocusRequester and FocusResponse are defined as above >>
AudioPlayback ros2-msg @ hal/audio/msg/AudioPlayback.msg
# Audio stream playback # Topic: /aima/hal/audio/playback builtin_interfaces/Time stamps # Timestamp AudioInfo info # Audio format AudioData data # Audio data string pkg_name # Playback source identifier string token_id # (Optional) changing token_id clears the playback buffer (used to interrupt current playback)
AudioInfo ros2-msg @ hal/audio/msg/AudioInfo.msg
uint8 channels # Number of channels uint32 sample_rate # Sample rate [Hz], currently only 16000 uint32 size # (not used) write size [byte] string sample_format # Audio format, currently only S16LE string coding_format # Audio coding format, currently only pcm
AudioData ros2-msg @ hal/audio/msg/AudioData.msg
uint8[] data
FocusResponse ros2-msg @ hal/audio/msg/FocusResponse.msg
# Audio focus change events # Topic: /aima/hal/audio/focus_response string pkg_name # Playback source identifier bool focus_gain # Focus grant result
PlayStateChange ros2-msg @ hal/audio/msg/PlayStateChange.msg
# Audio playback state events # Topic: /aima/hal/audio/play_state string pkg_name # Playback source identifier PlayStateType state # Playback state
PlayStateType ros2-msg @ hal/audio/msg/PlayStateType.msg
uint8 value # Playback state (0: off, 1: playing, 2: stopped)
MIC Audio Stream Capture Topic
Supports receiving real-time VAD (Voice Activity Detection) events on denoised audio and the corresponding audio stream, as well as raw audio stream capture.
Topic Name |
Data Type |
Description |
QoS |
Frequency |
|---|---|---|---|---|
|
VAD audio capture |
- |
Event-triggered, cached data for voice recognition would be sent in a burst at start of VAD event, then would update at ~25Hz |
|
|
Raw audio capture |
- |
ProcessedAudioOutputros2-msg @ interaction/msg/ProcessedAudioOutput.msgMessageHeader header # Message header uint32 stream_id # Audio stream ID (1: onboard mic, 2: external mic; regardless of which mic is active, audio is always published with stream_id=1 and saved under the fixed stream_1/ subdirectory) AudioVadStateType audio_vad_state # VAD state (0: no speech, 1: speech start, 2: in speech, 3: speech end) uint8[] audio_data # Audio data (PCM, 16 kHz / 16 bit / 1 ch)
Audio stream format:
Sample rate: 16 kHz
Bit depth: 16 bit
Channels: mono
Encoding: PCM
Attention
The wake word required to activate VAD (since v0.9):
In default mode (built-in interaction ON), always say the wake word before target voice, as VAD only keep activated for a short while.
In
only_voicemode (build-in interaction disabled), VAD keep activated for long once waked by the wake word. No more wake words needed later, all voice detected later on would be captured as VAD streams
AudioCaptureros2-msg @ hal/audio/msg/AudioCapture.msg# Raw audio capture # Topic: /aima/hal/audio/capture builtin_interfaces/Time stamps uint8 mic_channels # Number of microphone channels uint8 ref_channels # Number of reference (echo-cancellation) channels AudioInfo info # Audio format AudioData data # Audio data string pkg_name # Audio source
Microphone Control Services
Service Name |
Data Type |
Description |
|---|---|---|
|
|
Query the current MIC device |
|
|
Switch the MIC device |
GetMicSourceRequestros2-srv @ interaction/srv/GetMicSourceRequest.srv# Query current MIC device # Service: /aimdk_5Fmsgs/srv/GetMicSourceRequest # Request CommonRequest header --- # Response CommonResponse header uint32 mic_source # 0: built-in mic, 1: external mic
SetMicSourceRequestros2-srv @ interaction/srv/SetMicSourceRequest.srv# Switch MIC device # Service: /aimdk_5Fmsgs/srv/SetMicSourceRequest # Request CommonRequest header uint32 mic_source # 0: built-in mic, 1: external mic --- # Response CommonResponse header
Programming Examples
For detailed programming examples and code descriptions, see:
C++ Examples:
Python Examples:
Safety Notes
Warning
Voice playback limitations
The TTS service uses a priority system; avoid starting multiple speech playbacks at the same time.
Higher-priority speech will interrupt lower-priority speech; configure priorities carefully.
Check the current playback state before starting new speech.
Caution
As standard ROS DO NOT handle cross-host service (request-response) well, please refer to SDK examples to use open interfaces in a robust way (with protection mechanisms e.g. exception safety and retransmission)
Note
Best Practices
Choose appropriate priority levels to avoid interfering with important announcements.
Implement monitoring and exception handling for speech playback.
Implement a playback queue for speech management.
Pay attention to the required audio format and sample rate.
The receive queue (QoS depth) of VAD should be large enough
Never forget wake words when using VAD