Vibe Coding the Brazen Head

Since I’m used to coding the traditional way, I haven’t used LLMs for much more than code snippets. Since I’ve been reading all the hype about how coding is over, and LLMs can vibe code anything for you, I wanted to try a vibe coded project. An abandoned prototype I made at Praxis labs was an animated interactive talking head – you spoke and the head replied, with lip-sync and facial expressions to make it feel like a real person. Ultimately it was abandoned due having longer latency than we wanted, but I always wanted to explore the idea further.

When I was a little kid I watched the BBC children’s TV series “The Box of Delights” and one thing that stayed with me was the brazen head, which was a talking disembodied magical bronze head. This is a recurring theme in mythology, and makes me think of the way that modern chatbots like ChatGPT often act as oracular heads created by science, not magic, even down to the slightly sinister air that if you converse with them too long you can go down some dark paths.

The tech I used to get this working was primarily Github Copilot, using Claude Opus 4.6. I went in stages rather than asking for a complete, vibe-coded website all at once. I started out by asking it to display and animate a 3D model of a head (which I supplied) using Typescript, React and ThreeJS. This is the tech stack I used for the original prototype at Praxis labs so I wanted to see how it compared to the hand-generated code (which sadly I no longer have access to). Ultimately it ended up pretty similar, maybe a bit more terse, which was impressive!

I used Gemini to generate the background of an alchemist’s workshop but was unable to get an LLM-generated head with blend shapes to form the lip shapes to allow lip sync. Meshy.AI can’t do blend shapes yet. Unreal Metahuman wouldn’t export blend shapes for game-optimized avatars. So I downloaded a plain blue avatar from Sketchfab, cut off the body in Blender, and changed the materials to shiny bronze.

Initially everything went really fast. Claude Opus 4.6 generated the boilerplate needed to display and animate the head with very little help needed from me, except to place the head in the right camera position. There was some issues with getting the blend shapes to work, but that wasn’t the fault of the LLM – I had to replace the original model I used (which had some badly set-up blend shapes) with the model described above.

Next I used the plan feature to figure out what services I needed to connect the head up to in order to get Text-To-Speech, Visemes, Chatbot services, and Text to Speech. For text to speech I used the Whisper model hosted on Huggingface, and Qwen2.5-7B-Instruct, also hosted on Huggingface, for the Chat “personality”. I chose these primarily because they are free, but they worked just fine for this simple use.

Text-to-speech was what I spent longest on. There are a dizzying array of start-ups providing this right now, but I really wanted it to provide visemes for lip sync. For the Praxis labs prototypes I tried Amazon Polly and Azure speech services: both worked well but weren’t exactly what was needed for the business case, so we ended up going with non-animated images only. For the Brazen Head I could do what I liked so I selected Inworld because it was cheap and simple. Just an API key needed. I really wanted a free service but I could not locate any truly free services for this. I considered just getting text-to-speech without the visemes, which opened up more services, and Claude managed to create a pretty good code-based viseme module, but I couldn’t find any truly free text-to-speech services that had nice, expressive voices. So if the credits run out, the demo linked on this page won’t work, sorry. Drop me a line on the contact page and I might add a few more bucks! Or recommend me a free TTS service with custom or fantasy voices.

With that working, I needed a way of getting the demo app up on my WordPress portfolio site, and Claude was ready to help plan that too. It was pretty simple: host on a free Vercel site, embed in an iframe. Claude provided all the boilerplace code needed and I got it up and running pretty fast. One problem was that I pasted the API keys wrongly into Vercel which led to errors on the deployed app: Claude was plain useless for that, hallucinating all sorts of bizarre possible fixes, until it finally suggested I check the environment variables. The “sycophancy” of LLMs can prevent them from considering “human error, you dumbass.”

Future improvement ideas: the prototype I did for Praxis Labs didn’t need to press a button to speak or finish speaking. You spoke and the AI took turns, replying after you stopped speaking. It also streamed both speech-to-text and text-to-speech, which had much lower latency as it didn’t have to wait for upload or download to begin playing. I suspect that these features could be vibe coded quite easily and you can fork my project if you want to try it.

Ultimately this worked great. It took me about 3 days to get perfect. It did need a lot of guidance and setup, so it’s not quite the fully agented workflow that the tech-bros boast about, but I don’t feel “lucky” enough to give all my codes and permissions to an agent just yet. Plus, I didn’t want to spend too much on tokens. I ended up using all my free tokens for the month on Copilot and spending another dollar.

Try it! and check out the GitHub.

You Might Also Like

Can an LLM playtest a game?

Dealing with old LFS bloat filling up your Git cloud storage