I immediately ‘got’ Relevance AI when I was introduced to it. The flexibility to create LLM-powered things to automate stuff appealed to the marketing automation nerd in me. The dynamic reasoning capabilities meant that I could now automate things that were always in the too hard basket. Partly because it was low-code and partly because of how easy the iteration cycle could be.
In my previous workplace we’d been working on an LLM-powered email generator proof of concept. We had A LOT of required customer communications, in multiple languages, due to the nature of the product and customer-base. This was required but it was stopping the team doing as much strategic work as we would have liked. We had 1000’s of previous emails we could use to train the LLMs and we already knew how good the translation was as we’d already used LLMs to translate the website into a couple of languages. We’d just started when I left but I could immediately see that the iteration cycle to produce a consistent output was going to be a lot slower than I initially thought.
Building an AI Team
So yeah I was sold!
When I joined we were about to release the Multi-Agent System functionality so you could build an interconnected team that worked together. Immediately I was thinking about building a content team based on the structure of very successful team that I’d been lucky to be a part of.
The following is a roughly structure of the roles in this very real team.
Team Manager, who worked on things like topic research as well as, of course, keeping the team happy and accountable.
New Content Development - who did detailed research to complete a brief, wrote the content based on the brief, and created supporting assets (in this case templates).
Existing Content Optimization - analyzing data to determine if the post could improved. Doing research on whether there’s anything new to add to the topic.
Editor - helping the team to produce compelling content and publishing the content.
Translation - doing research to determine what should be translated, getting it translated.
I created the New Content Development agent first. An agent that would create a very detailed content brief, based on a given Topic. Using Google, Linkedin, and other external sources it would do desk research, determine an angle, the intent of the audience, and what the structure of the content should be. It would ask me to approve the brief and if I approved, it would write the article. The dream was to have this running on autopilot.
The output was okay but often not consistent. It usually required a bit of prodding by me to get both the brief and the article into a decent shape. It was fine, created some decent content, but ultimately a little underwhelming.
It gave me a pause and made me question whether replicating a human team was actually going to work. Or would I just end up spending my days editing prompts and responding to my agents?
Agent Building: Thinking in Tasks
I’ve been lucky enough to learn from all the amazing people around me everyday. Through osmosis it’s become very clear how I should have thought about this for a better quality output. Think in tasks, not roles.
AI agents are very good at doing one thing well. Give them a clear goal, with clear instructions, a limited set of specific abilities (or AI tools as we call them), and they will complete a repetitive tasks to a very high standard.
So instead of having a New Content Development Agent, I should have broken the role into discrete tasks and created a team of agents that worked together.
New Content Development Manager - receives and parses the request, passes it on to the relevant agent, keep the quality bar high, and publish the content.
Researcher - given a standard process to follow, examples of what good looks like, and some tools to access research resources (i.e. Google, Linkedin, Website Scraping)
Content Writer - Trained in good writing and some examples of what good looks like for us. Equipped with a writer tool so that I could test out the different LLMs. Claude, for example, is known for being better at writing than ChatGPT so I could have tested this out.
Template Creator - Trained in what makes a good template and what format it should be in.
This structure would enable each agent to have a specific goal and therefore a more reliable output. De-bugging and optimizing the process would have also been easier as I could have focused on the component tasks. As I’ve learnt, Agents also have limited context memory so having a broad remit will often result in poor and inconsistent choices by the AI. Ultimately degrading the quality and making it hard to scale. A modular approach is much much better.
Improving Quality with Multi-Agent Debate
So far i’ve focused mostly on the practicalities of using a Multi-Agent System. Getting around context memory challenges, reducing complexity, and so on. But recent research has shown that there is something really cool that happens when AI agents work together and specifically debate amongst themselves.
A study by researchers at MIT and Google Brain ran a bunch of experiments to answer the following questions:
(1) To what extent does multiagent debate improve reasoning?
(2) To what extent does multiagent debate improve factual validity?
They found that reasoning substantially improved. Even if an agent started with an incorrect response, the critique by other agents often resulted in reaching the correct answer via debate.
Factual validity also improved as agents worked together to reach a consensus, which was more accurate.
So, theoretically, charging the New Content Development Manager to fact-check and debate with the other agents should improve the quality of the content. It would be factually more likely to be accurate and the narrative should improve. This would allow the delivery of great content at scale and given me the confidence to let the team run on auto-pilot.
A Multi-Agent Mindset
It’s human nature to personify, it’s even more appealing when it comes to AI agents. It such an easy thing to say to “think about tasks rather than job roles”, but it’s surprisingly hard. It’s even harder when you’ll still get a decent output when you don’t. For co-pilot agents it’s probably fine as you can always correct them, but you’ll never be able to go autopilot unless you can ensure a consistent output.
The key is to break down problems into discrete tasks with discrete goals and abilities. Then build an Agent for each task that can collaboratively work with other agents to improve their output. This ensures that the task can be performed consistently to a high standard. While also making it really easy to de-bug and optimize.
I hope this has been useful. It changed my mental-model entirely on how I can build my AI workforce.