Turning Conversations into Usable Data with Live Transcription

Adna Halugic Colo

Product Owner

Beyond the new features themselves, the message is simple: we’re not building AI to follow a trend. We’re building tools that you, as resellers and service providers, can confidently present to customers and build offerings around. While AI Voice Agents help guide conversations in real time, many businesses are sitting on a massive amount of conversational data and doing nothing with it. So now, we want to share something that sits quietly in the background but has massive practical value: Live Transcription Service.

How Live Transcription Works

When a call starts and recording is enabled, the system automatically initiates a transcription session. The Live Transcription Service continuously reads the stereo audio recording and streams it in small chunks to the selected speech-to-text provider.

We currently support providers such as OpenAI, Deepgram, Google, and Amazon Web Services (AWS). The transcription happens in real time, and the resulting text is published to a WebSocket callback defined by you.

From there, the stream goes wherever they want it to go, CRM, helpdesk, analytics platform, internal dashboard, or a completely custom-built application.

Practical Positioning for Resellers

Let’s look at this from your perspective.

Because transcription can be enabled at the ERG or Queue level, you can activate it exactly where it makes sense (sales teams, support queues, VIP clients, compliance-sensitive departments). It doesn’t have to be system-wide.

Stereo recording and call recording must be enabled, and the selected AI provider credentials need real-time speech-to-text access.

Once that’s configured, each Live Transcription setup simply defines the AI provider and the WebSocket callback destination. Configuration fields adjust depending on the provider’s requirements.

The opportunity isn’t in the transcription itself, it’s in what you choose to connect it to.

And importantly, you are responsible for defining the WebSocket destination or building the application that consumes the stream. That means you can step in as the solution provider and design that layer.

Why Server-Side First

This first phase is implemented entirely on the server side. That was a deliberate decision.

By handling transcription centrally, we simplify deployment and give resellers a stable foundation to build on.

It also lays the groundwork for future application-level implementations, but more importantly, it lets customers consume live data in the systems they rely on every day.

For many of your customers, that’s the value. They want usable information while the call is still happening.

To Conclude

In Part 1, we focused on AI interacting with callers through a Voice Agent. In Part 2, we’re focusing on capturing conversations as structured text in real time.

Together, these features form a strong foundation for resellers and service providers who want to build more advanced communication workflows around PBXware.

And we’re just getting started. This is the second step in how we’re approaching AI inside PBXware: practical, deployable, and built with partners in mind.

If you’d like to learn more about how Live Transcription works in PBXware, or discuss how it could fit into your existing deployments, feel free to reach out. We’ll be happy to walk you through the setup and answer any questions you might have.