App: Live Transcription and Translation in Nextcloud Talk (live_transcription)

This app provides live transcription and translation of speech in Nextcloud Talk calls using open source AI models provided by Vosk.
The transcription is done on your own server, preserving your privacy and data sovereignty, while the translation is done using a translation task processing provider like the translate2 app. OpenAI and LocalAI integration and DeepL integration apps will soon also be supported for translation.
A good set of language models for transcription are auto-downloaded. They include Arabic, Arabic (Tunisian), Breton, Catalan, Czech, German, English, Esperanto, Spanish, Persian (Farsi), French, Hindi, Italian, Japanese, Kazakh, Korean, Dutch, Polish, Portuguese (Brazilian), Russian, Telegu, Tajik, Turkish, Ukrainian, Uzbek, Vietnamese and Chinese.
The translation capabilities depend on the installed translation task processing provider app. A list of translation-capable apps can be found here in the “Backend apps” section.

Installation

  1. Make sure the Nextcloud Talk app is installed.

  2. Make sure the High-Performance Backend (latest or released after September 2025) is installed and configured in Nextcloud Talk settings. See the Nextcloud Talk install manual for more information.

  3. Setup a Deploy Daemon in AppAPI Admin settings.

  4. Install the live_transcription app via the “Apps” page in Nextcloud, or by executing

occ app_api:app:register live_transcription \
  --env LT_HPB_URL=wss://cloud.example.com/standalone-signaling/spreed \
  --env LT_INTERNAL_SECRET=1234 \
  --wait-finish

Important

The environment variables LT_HPB_URL and LT_INTERNAL_SECRET must be set in the Deploy Options during installation, and the High-Performance Backend must be functionally configured in Nextcloud Talk settings for the app to work.

Changing these environment variables after installation is possible through a re-installation of the app after uninstalling it first.

  1. Install a Text-to-text task processing provider app for translation capabilities from the “Backend apps” section here.

Requirements

  • Minimal Nextcloud version: 33

  • Nextcloud AIO is supported

  • We currently support NVIDIA GPUs and x86_64 CPUs. Only CPU-based transcription is also supported and works well on modern x86 CPUs.

  • CUDA >= v12.4.1 on your host system for GPU-based transcription

  • GPU Sizing

    • A NVIDIA GPU with at least 10 GB VRAM

    • 16 GB of system RAM should be enough for one or two concurrent calls

  • CPU Sizing

    • x86 CPU with 4 threads. Additional 2 threads per concurrent call.

    • 16 GB of RAM should be enough for one or two concurrent calls

  • Space usage
    • ~ 2.8 GB for the docker container

    • ~ 6.0 GB for the default models

Note

We currently have very little real-world experience running this software on production instances. The above sizing recommendations come from our estimates and are not real-world benchmarks. Actual requirements will vary based on factors such as the number of concurrent calls, audio quality, and selected languages. Please do thorough testing to confirm your hardware meets your needs.

App store

You can also find the app in our app store, where you can write a review: https://apps.nextcloud.com/apps/live_transcription

Repository

You can find the app’s code repository on GitHub where you can report bugs and contribute fixes and features: https://github.com/nextcloud/live_transcription

Nextcloud customers should file bugs directly with our Customer Support.

Limitations

  • The generated transcripts may not be perfect and may contain errors. It can also depend on the audio quality and the speaker’s accent.

  • The app currently supports only a limited number of languages. More languages may be added in the future.

  • The languages other than English may have lower accuracy mainly due to the shipped models being smaller.

  • The app currently does not support punctuation in the transcription.

  • OpenAI and LocalAI integration and DeepL integration apps are not yet supported for translation.