This package has a sample client which demonstrates interaction with a Studio Voice NIM.
NVIDIA Maxine NIM Client packages use gRPC APIs. Instructions below demonstrate usage of Studio Voice NIM using Python gRPC client. Additionally, access the Try API feature to experience the NVIDIA Studio Voice NIM API without hosting your own servers, as it leverages the NVIDIA Cloud Functions backend.
- Ensure you have Python 3.10 or above installed on your system. Please refer to the Python documentation for download and installation instructions.
- Access to NVIDIA Studio Voice NIM Container / Service.
git clone https://github.com/nvidia-maxine/nim-clients.git
// Go to the 'studio-voice' folder
cd nim-clients/studio-voicesudo apt-get install python3-pip
pip install -r requirements.txtBefore running client part of Studio Voice, please set up a server. The simplest way to do that is to follow the quick start guide. This step can be skipped when using Try API.
Before running the python client, you can choose to compile the protos. The grpcio version needed for compilation can be referred at requirements.txt
To compile protos on Linux, run:
// Go to studio-voice/protos/linux folder
cd studio-voice/protos/linux
chmod +x compile_protos.sh
./compile_protos.shTo compile protos on Windows, run:
// Go to studio-voice/protos/windows folder
cd studio-voice/protos\windows
compile_protos.batGo to the scripts directory.
cd scriptsStreaming mode provides lower latency than transactional mode because it processes audio chunk-by-chunk without file I/O overhead. To run the client in streaming mode, add --streaming. The following example command processes the packaged sample audio file in streaming mode and generates a studio_voice_48k_output.wav file in the current folder.
python studio_voice.py --target 127.0.0.1:8001 --input ../assets/studio_voice_48k_input.wav --output studio_voice_48k_output.wav --streaming --model-type 48k-llNote: When using
--streamingmode, ensure the selected--model-type(48k-hq,48k-ll, or16k-hq) aligns with theNIM_MODEL_PROFILEModel Type configuration to maintain compatibility.
To run client in transactional mode. Set --model-type in accordance with the server, default is set to 48k-hq. The following example command processes the packaged sample audio file in transactional mode and generates a studio_voice_48k_output.wav file in the current folder.
python studio_voice.py --target 127.0.0.1:8001 --input ../assets/studio_voice_48k_input.wav --output studio_voice_48k_output.wav --model-type 48k-hqNote: To use the client in Streaming mode, launch the NIM with
STREAMING=true. Similarly, to use the client in Transactional mode, launch the NIM withSTREAMING=false. The client mode must match the server mode.
Only WAV files are supported.
python studio_voice.py --preview-mode \
--ssl-mode TLS \
--target grpc.nvcf.nvidia.com:443 \
--function-id <function_id> \
--api-key $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC \
--input <input_file_path> \
--output <output_file_path> \--preview-mode- Flag to send request to preview NVCF server on https://build.nvidia.com/nvidia/studiovoice/api.--ssl-mode- Flag to control if SSL MTLS/TLS encryption should be used. When running preview SSL must be set to TLS. Default value isNone.--ssl-key- The path to ssl private key. Default value isNone.--ssl-cert- The path to ssl certificate chain. Default value isNone.--ssl-root-cert- The path to ssl root certificate. Default value isNone.--target- IP:port of gRPC service, when hosted locally. Use grpc.nvcf.nvidia.com:443 when hosted on NVCF.--api-key- NGC API key required for authentication, utilized when usingTRY APIignored otherwise.--function-id- NVCF function ID for the service, utilized when usingTRY APIignored otherwise.--input- The path to the input audio file. Default value is../assets/studio_voice_48k_input.wav.--output- The path for the output audio file. Default is current directory (scripts) with namestudio_voice_48k_output.wav.--streaming- Flag to control if streaming mode should be used. Transactional mode will be used by default.--model-type- Studio Voice model type hosted on server. It can be set to48k-hq/48k-ll/16k-hq. Default value is48k-hq.
Note: For a step-by-step walkthrough of all the configuration options described above, see the provided Jupyter notebook at
notebook/studio_voice_notebook.ipynb.
Refer the docs for more information.