studio-voice

NVIDIA Studio Voice NIM Client

This package has a sample client which demonstrates interaction with a Studio Voice NIM.

Getting Started

NVIDIA Maxine NIM Client packages use gRPC APIs. Instructions below demonstrate usage of Studio Voice NIM using Python gRPC client. Additionally, access the Try API feature to experience the NVIDIA Studio Voice NIM API without hosting your own servers, as it leverages the NVIDIA Cloud Functions backend.

Pre-requisites

Ensure you have Python 3.10 or above installed on your system. Please refer to the Python documentation for download and installation instructions.
Access to NVIDIA Studio Voice NIM Container / Service.

Usage guide

1. Clone the repository

git clone https://github.com/nvidia-maxine/nim-clients.git

// Go to the 'studio-voice' folder
cd nim-clients/studio-voice

2. Install Dependencies

sudo apt-get install python3-pip
pip install -r requirements.txt

3. Host the NIM Server

Before running client part of Studio Voice, please set up a server. The simplest way to do that is to follow the quick start guide. This step can be skipped when using Try API.

4. Compile the Protos

Before running the python client, you can choose to compile the protos. The grpcio version needed for compilation can be referred at requirements.txt

To compile protos on Linux, run:

// Go to studio-voice/protos/linux folder
cd studio-voice/protos/linux

chmod +x compile_protos.sh
./compile_protos.sh

To compile protos on Windows, run:

// Go to studio-voice/protos/windows folder
cd studio-voice/protos\windows

compile_protos.bat

5. Run the Python Client

Go to the scripts directory.

cd scripts

Usage for Streaming NIM Request (Recommended)

Streaming mode provides lower latency than transactional mode because it processes audio chunk-by-chunk without file I/O overhead. To run the client in streaming mode, add --streaming. The following example command processes the packaged sample audio file in streaming mode and generates a studio_voice_48k_output.wav file in the current folder.

python studio_voice.py --target 127.0.0.1:8001 --input ../assets/studio_voice_48k_input.wav --output studio_voice_48k_output.wav --streaming --model-type 48k-ll

Note: When using --streaming mode, ensure the selected --model-type (48k-hq, 48k-ll, or 16k-hq) aligns with the NIM_MODEL_PROFILE Model Type configuration to maintain compatibility.

Usage for Transactional NIM Request

To run client in transactional mode. Set --model-type in accordance with the server, default is set to 48k-hq. The following example command processes the packaged sample audio file in transactional mode and generates a studio_voice_48k_output.wav file in the current folder.

python studio_voice.py --target 127.0.0.1:8001 --input ../assets/studio_voice_48k_input.wav --output studio_voice_48k_output.wav --model-type 48k-hq

Note: To use the client in Streaming mode, launch the NIM with STREAMING=true. Similarly, to use the client in Transactional mode, launch the NIM with STREAMING=false. The client mode must match the server mode.

Only WAV files are supported.

Usage for Preview API Request

python studio_voice.py --preview-mode \
    --ssl-mode TLS \
    --target grpc.nvcf.nvidia.com:443 \
    --function-id <function_id> \
    --api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC \
    --input <input_file_path> \
    --output <output_file_path> \

Command Line Arguments

--preview-mode - Flag to send request to preview NVCF server on https://build.nvidia.com/nvidia/studiovoice/api.
--ssl-mode - Flag to control if SSL MTLS/TLS encryption should be used. When running preview SSL must be set to TLS. Default value is None.
--ssl-key - The path to ssl private key. Default value is None.
--ssl-cert - The path to ssl certificate chain. Default value is None.
--ssl-root-cert - The path to ssl root certificate. Default value is None.
--target - IP:port of gRPC service, when hosted locally. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF.
--api-key - NGC API key required for authentication, utilized when using TRY API ignored otherwise.
--function-id - NVCF function ID for the service, utilized when using TRY API ignored otherwise.
--input - The path to the input audio file. Default value is ../assets/studio_voice_48k_input.wav.
--output - The path for the output audio file. Default is current directory (scripts) with name studio_voice_48k_output.wav.
--streaming - Flag to control if streaming mode should be used. Transactional mode will be used by default.
--model-type - Studio Voice model type hosted on server. It can be set to 48k-hq/48k-ll/16k-hq. Default value is 48k-hq.

Note: For a step-by-step walkthrough of all the configuration options described above, see the provided Jupyter notebook at notebook/studio_voice_notebook.ipynb.

Refer the docs for more information.

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
interfaces		interfaces
notebook		notebook
protos		protos
scripts		scripts
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

NVIDIA Studio Voice NIM Client

Getting Started

Pre-requisites

Usage guide

1. Clone the repository

2. Install Dependencies

3. Host the NIM Server

4. Compile the Protos

5. Run the Python Client

Usage for Streaming NIM Request (Recommended)

Usage for Transactional NIM Request

Usage for Preview API Request

Command Line Arguments

FilesExpand file tree

studio-voice

Directory actions

More options

Directory actions

More options

Latest commit

History

studio-voice

Folders and files

parent directory

README.md

NVIDIA Studio Voice NIM Client

Getting Started

Pre-requisites

Usage guide

1. Clone the repository

2. Install Dependencies

3. Host the NIM Server

4. Compile the Protos

5. Run the Python Client

Usage for Streaming NIM Request (Recommended)

Usage for Transactional NIM Request

Usage for Preview API Request

Command Line Arguments