Skip to main content
Version: QTrobot V3

QTrobot Audio processing and Microphone

QTrobot has an integrated high-performance digital microphone array in the head: a ReSpeaker Mic Array v2.0 board with voice activity detection, direction of arrival, beamforming and noise suppression on-chip. It's connected to QTRP via USB and is fully accessible to developers, raw or processed.

microphone

Specification:

  • 4 high-performance digital microphones
  • Far-field voice capture
  • Speech algorithm on-chip
  • 12 programmable RGB LED indicators
  • Microphones: ST MP34DT01TR-M
  • Sensitivity: -26 dBFS (omnidirectional)
  • Acoustic overload point: 120 dB SPL
  • SNR: 61 dB
  • Max sample rate: 16 kHz

Software interface

Microphone audio and tuning are handled by the microphone node in qtrobot-service-hub (config microphone.yaml, ZMQ port 50550). Continuous speech recognition is a separate, optional asr plugin layered on top of it.

  • robot.microphone (or /qtrobot/microphone/..., /qtrobot/mic/... over ROS2): raw multichannel audio streams, DSP tuning, voice activity/direction of arrival.
  • robot.asr (Python only, currently): continuous speech recognition, with a choice of cloud or local engines.

See the Microphone tutorial and ASR tutorial for full code walkthroughs, and the Python API reference for the complete call list.

Accessing microphone audio streams

The internal array publishes 5 channels: channel 0 is the processed, beamformed signal meant for speech recognition (with AEC applied, see below); channels 1–4 are the raw per-microphone signals, for anyone who wants to do their own array processing. If an external mic is enabled (see below), it publishes as its own separate stream.

def on_audio(frame):
process(frame.data)

robot.microphone.stream.on_int_audio_ch0(on_audio, queue_size=10)

# Or pull directly
reader = robot.microphone.stream.open_int_audio_ch0_reader(queue_size=10)
frame = reader.read(timeout=3.0)

Channels 1–4 (and the external mic) follow the exact same pattern, just substitute ch1...ch4 or ext for ch0. The TypeScript/Node.js SDK doesn't have raw RPC-style stream access like Python or ROS2; live audio is only available there over WebRTC native tracks, the same mechanism used for the camera feed.

Voice activity and direction of arrival

A separate, lightweight event stream tells you when someone is speaking and from which direction, without needing to process raw audio yourself:

def on_event(frame):
evt = frame.value
if evt.get("activity"):
print("Voice detected — DOA:", evt.get("direction"))

robot.microphone.stream.on_int_event(on_event, queue_size=2)

The payload carries activity (bool, True while voice is detected) and direction (degrees, 0–359, where 270 is the front of QTrobot due to the array's orientation in the head). This stream only delivers the latest event — events can be dropped if your consumer is too slow to keep up, so don't rely on it for anything that needs every single transition.

DSP tuning

The array's on-chip DSP (gain control, noise suppression, etc.) can be read and tuned live through the SDK:

params = robot.microphone.get_int_tuning()
print(params.get("AGCGAIN"))

robot.microphone.set_int_tuning(name="AGCONOFF", value=1.0)

Some parameters worth knowing:

  • AGCONOFF / AGCGAIN: automatic gain control on/off, and the fixed gain used when it's off.
  • GAMMA_NS_SR / MIN_NS_SR: noise suppression strength, pre-tuned at the factory to suppress QTrobot's internal fan noise.
  • AECFREEZEONOFF, ECHOONOFF, NLATTENONOFF, NLAEC_MODE, TRANSIENTONOFF, RT60ONOFF: the array's own on-chip echo/transient handling — typically left at their defaults, since QTrobot's software-side AEC (below) is the recommended way to handle echo.

set_int_tuning() changes apply immediately but only for the current session: they're reset the next time the microphone node restarts. To make a value permanent, set it under tunning: in microphone.yaml instead — that's what's applied at every startup.

tip

A dedicated tuning GUI tool used to exist for this in earlier QTrobot versions; it's no longer needed since the SDK lets you read and tune every parameter live, as shown above.

Acoustic echo cancellation (AEC)

QTrobot's microphone channel 0 (the ASR-facing channel) runs through a software acoustic echo canceller (WebRTC's AEC3), so QTrobot can hear you even while it's speaking, instead of picking up its own voice. It works by feeding the same audio that's playing on the FG speaker lane (most commonly TTS) into the canceller as a reference signal, so it can be subtracted out of what the microphone hears.

aec:
enabled: true
playback_latency_ms: 600
aec_stream_delay_ms: 100
debug_record: false
debug_dir: "/tmp"
  • playback_latency_ms: estimated total latency of the audio playback pipeline; used to align the reference signal in time with the echo the mic actually picks up.
  • aec_stream_delay_ms: a residual delay correction on top of that, passed straight to AEC3.
  • debug_record / debug_dir: when enabled, writes the reference, raw-mic and AEC-cleaned audio to WAV files, useful if you ever need to recalibrate the latency values above for a non-default audio setup (e.g. an external speaker with extra processing latency).

In practice, you shouldn't need to touch this section unless you're using an external speaker or notice degraded recognition quality while QTrobot is talking.

Using an external microphone

You can add a USB microphone alongside QTrobot's built-in array. Both can run simultaneously: enabling the external mic doesn't replace or disable the internal array, it just adds another stream. There's no way to disable internal capture alone (only microphone.enabled: false turns off the whole node, internal and external together) — if you only want to use the external mic, simply have your application subscribe to its stream and ignore the internal channels.

1. Plug in the USB microphone

Use the USB port on QTRP, at the back of the QTrobot.

2. List capture devices

ssh qtrp
arecord -l

# Example output (truncated):
**** List of CAPTURE Hardware Devices ****
card 2: ArrayUAC10 [ReSpeaker 4 Mic Array (UAC1.0)], device 0: USB Audio [USB Audio]
card 3: Mic [Samson Meteorite Mic], device 0: USB Audio [USB Audio]

3. Configure the external mic

Edit microphone.yaml:

external:
enabled: true
alsa_device: "plughw:Mic,0" # ALSA device string from `arecord -l`, e.g. "plughw:3,0"

4. Restart the service hub

sudo systemctl restart qtrobot-service-hub.service

Once enabled, the external mic is published as its own stream (robot.microphone.stream.on_ext_audio_ch0(...) in Python, /qtrobot/mic/ext/audio/ch0/stream over ROS2), separate from the internal array's channels.

Speech recognition (ASR)

For continuous speech recognition, robot.asr is a plugin offering a choice of engines, Python SDK only at the moment. Each follows the same shape: configure_<engine>(...) once, then either a blocking recognize_<engine>() call or continuous streams (<engine>_speech, <engine>_event) you subscribe to.

Runs against qtrobot-parakeet-asr-server (e.g. on a Jetson Orin), with no API key and no internet dependency.

robot.enable_plugin_local("asr-parakeet")

robot.asr.configure_parakeet(
endpoint="tcp://10.231.0.1:50860",
language="en",
use_vad=True,
continuous_mode=True,
)

robot.asr.stream.on_parakeet_speech(lambda s: print("speech:", s.value))

Pick Parakeet or Riva for fully local/offline recognition (no audio leaves your network), or Azure/Groq for a managed cloud engine with minimal setup. See the ASR tutorial and Python API reference for the full parameter list of each engine, streaming vs. one-shot recognition, and cancellation.

Tips for better speech recognition

The microphone array sits on top of QTrobot's head, so it's always better to talk to QTrobot from above the robot, so your voice reaches it clearly.


For local engines (Parakeet, Riva), keep use_vad enabled so recognition only runs while voice is actually detected, rather than continuously processing silence.


If you're getting QTrobot's own speech picked up by the microphone while it talks, check the AEC settings above before reaching for engine-level noise suppression tweaks; that's almost always an AEC latency calibration issue, not a microphone problem.