App: Local Text-To-Speech (text2speech_kokoro)

The text2speech_kokoro app is one of the apps that provide Text-To-Speech functionality in Nextcloud and act as a speech generation backend for the Nextcloud Assistant app and other apps making use of the core `Text-To-Speech Task type. The text2speech_kokoro app specifically runs only open source models and does so entirely on-premises. Nextcloud can provide customer support upon request, please talk to your account manager for the possibilities.

This app uses Kokoro under the hood.

The used model supports the following languages:

  • American English

  • British English

  • Spanish

  • French

  • Italian

  • Hindi

  • Portuguese

  • Japanese

  • Mandarin

Requirements

  • Minimal Nextcloud version: 31

  • This app is built as an External App and thus depends on AppAPI v2.3.0

  • Nextcloud AIO is supported

  • We currently support x86_64 CPUs

  • We do not support GPUs

  • CPU Sizing

    • The more cores you have and the more powerful the CPU the better, we recommend around 10 cores

    • The app will hog all cores by default, so it is usually better to run it on a separate machine

    • 800MB RAM

Installation

  1. Make sure the Nextcloud Assistant app is installed

  2. Install AppAPI and setup a Deploy Demon

  3. Install the text2speech_kokoro “Local Text-To-Speech” ExApp via the “Apps” page in the Nextcloud web admin user interface

Scaling

It is currently not possible to scale this app, we are working on this. Based on our calculations an instance has a rough capacity of 4h of transcription throughput per minute (measured with 8 CPU threads on an Intel(R) Xeon(R) Gold 6226R). It is unclear how close to real-world usage this number is, so we do appreciate real-world feedback on this.

App store

You can also find this app in our app store, where you can write a review: https://apps.nextcloud.com/apps/text2speech_kokoro

Repository

You can find the app’s code repository on GitHub where you can report bugs and contribute fixes and features: https://github.com/nextcloud/text2speech_kokoro

Nextcloud customers should file bugs directly with our customer support.

Known Limitations

  • We currently only support languages supported by the underlying Kokoro model

  • The Kokoro models perform unevenly across languages, and may show lower accuracy on low-resource and/or low-discoverability languages or languages where there was less training data available.

  • Make sure to test the language model you are using it for whether it meets the use-case’s quality requirements

  • Customer support is available upon request, however we can’t solve false or problematic output, most performance issues, or other problems caused by the underlying model. Support is thus limited only to bugs directly caused by the implementation of the app (connectors, API, front-end, AppAPI)