6 tool design rules for building reliable ElevenLabs voice agents on Palantir Foundry | 10x Partners

You've built an ontology. Your objects are clean, your properties are typed, your data pipelines hum along. Now you want an AI voice agent (say, an ElevenLabs conversational agent) to actually use that data. You wire up a tool, test it once, ship it.

Then the agent starts hallucinating ticket statuses. It calls the wrong tool half the time. It chokes on a parameter type mismatch and goes silent mid-conversation.

The problem is rarely the data. It's how you expose it. After building dozens of these integrations, here are six rules that separate tools that work from tools that quietly fail.

1. Default to GET endpoints — always

This is the single highest-impact decision you'll make, and it's the one most teams get wrong first.

When you give an LLM a GET endpoint, it fills in query parameters — key-value pairs in a URL. Flat, predictable, hard to mess up. When you give it a POST endpoint, it has to generate a valid JSON body with correct nesting, correct types, correct field names. That's where agents break.

	GET	POST
What the LLM builds	Query string	JSON body
Reliability	High	Lower — malformed JSON is common
Debugging	Inspect the URL	Parse the request body
ElevenLabs config	Straightforward	Complex parameter definitions

When POST is unavoidable — complex filtering, aggregations, multi-field searches — don't hand that complexity to the LLM. Wrap it in a Foundry Function that accepts simple input parameters and handles the POST logic server-side. The agent calls a clean interface; the messy work happens where it belongs.

If you truly can't avoid a raw POST, minimize the body to the fewest possible parameters and include example payloads in the tool description. But treat this as a last resort, not a starting point.

2. Name your tools like function signatures

Tool names are the first thing an LLM evaluates when deciding which tool to call. Before it reads the description, before it looks at parameters — it reads the name. A vague or inconsistently formatted name forces the model to lean heavily on descriptions, which slows selection and increases error rates.

Format: Lowercase, underscores, verb-first.

get_resolved_tickets_by_email
fetch_single_merit_badge
search_programs_by_name

Not this:

getResolvedTickets        → camelCase — inconsistent with platform conventions
Get-Tickets               → PascalCase with hyphens — ambiguous parsing
tickets                   → too vague — fetch? search? delete?

Here's the litmus test: if a developer couldn't guess the behavior from the name alone, the LLM can't either. A good tool name is a contract — it promises what the tool does, what it acts on, and how it filters.

3. Describe what the tool does — never when to use it

This is where most teams go wrong, and it's the most counterintuitive rule in the list.

The instinct is to write helpful descriptions:

"Use this tool when the user asks about tickets"

That reads like good documentation. But it's exactly the wrong information at the wrong time. The LLM doesn't need guidance on intent during tool selection — it needs to know what the tool returns. Intent belongs elsewhere (see rule 6).

The formula:

[Action verb] + [what it returns] + [key parameters or filters]

Good:

"Fetches all resolved tickets for a given person by email address"
"Retrieves a single merit badge by its unique identifier"
"Searches programs by name and returns matching results"

Bad:

"Use this tool when the user asks about tickets" — describes when, not what
"Gets stuff from the database" — useless to an LLM trying to match capabilities
"This tool is for fetching tickets. It can filter by email and status and returns JSON." — buries the point in noise

Include: what data comes back, key parameters if not obvious from the name, optionally an example response format.

Exclude: when to use it, conversational guidance, example dialogues. All of that belongs in the system prompt.

4. One tool, one job — eliminate overlaps

When two tools have overlapping purposes, the LLM will pick the wrong one roughly half the time. This isn't a model limitation — it's an ambiguous interface. You're asking the agent to make a distinction you haven't clearly defined.

The problem:

get_single_merit_badge      → fetches one badge by ID
get_multiple_merit_badges   → fetches many badges

The distinction is "one vs. many," which is context-dependent and subtle. Does "show me the wilderness survival badge" want one result or a search? The agent has to guess — and it will guess wrong often enough to break trust.

The fix:

get_merit_badge_by_id             → exact lookup by identifier
search_merit_badges_by_criteria   → search with filters

Now the tools have functionally distinct purposes. One is a direct lookup, one is a search. The LLM doesn't have to infer intent; it matches the user's request to the right capability based on the operation type.

Before adding a new tool, ask: Could an existing tool handle this with a different parameter? If yes, extend the existing tool. Every additional tool you add is another choice the LLM has to make — and every choice is a chance to get it wrong.

5. Match parameter types exactly — or fail silently

This rule is less about design philosophy and more about a specific trap that will cost you hours of debugging if you don't know about it.

ElevenLabs enforces strict type checking on tool parameters. If your Foundry API expects an integer and your tool definition says string, the call won't throw a visible error. It will be silently rejected — and the agent will have no idea why. From the user's perspective, the agent just... stops responding.

Parameter Type	Foundry API Expects	ElevenLabs Setting	What Happens If Wrong
String	`"value"`	`type: "string"`	Works fine
Integer	`123`	`type: "integer"`	API silently rejects `"123"`
Boolean	`true` / `false`	`type: "boolean"`	API silently rejects `"true"`
Array	`["a", "b"]`	`type: "array"`	Avoid entirely if possible

We've spent entire debugging sessions on what turned out to be a single parameter typed as string instead of integer. The agent appeared to work — it selected the right tool, built the right request shape — but the API rejected every call. No error surfaced anywhere in the ElevenLabs logs.

The fix: Open the OSDK Developer Console. Every property shows its type. Match your ElevenLabs parameter definitions exactly. No assumptions, no shortcuts.

For complex types like arrays or nested objects, the same principle from rule 1 applies: wrap the complexity in a Foundry Function. Simple typed inputs in, complex API calls out.

6. Separate tool descriptions from system prompts

This is the architectural rule that ties the other five together — and the one that's hardest to get right because it requires you to think about how LLMs reason in stages.

Tool descriptions tell the LLM what a tool can do. System prompts tell the LLM when and how to use its tools. They serve different functions at different points in the reasoning chain, and mixing them degrades both.

Tool description (the what):

"Fetches all resolved tickets for a given person by email address."

System prompt (the when):

"When the user asks about resolved tickets or completed work for a specific person, use the get_resolved_tickets_by_email tool with their email address."

Why does this separation matter? Because the LLM processes these at different stages of reasoning:

Tool descriptions are evaluated during tool selection — the model scans them to find a capability match. They need to be concise and factual.
System prompts provide orchestration context — when to prefer one tool over another, what to do with ambiguous requests, how to handle edge cases, what to say while waiting for results.

When you put "use this when the user asks about..." inside a tool description, you're injecting orchestration logic into the selection step. The model gets noisier signal and makes worse choices. Keep them clean. Keep them separate.

Summary: The 6 rules

Your Palantir ontology is a powerful, well-structured representation of your business. But an AI voice agent doesn't see that structure — it sees the tools you give it. Every design decision you make is either signal or noise.

#	Rule	Principle
1	Default to GET	Simpler API shape = more reliable agent
2	Verb_noun_qualifier naming	The name is the first — and sometimes only — thing the LLM reads
3	Describe what, not when	Tool descriptions are capability declarations, not usage guides
4	No overlapping tools	Every ambiguous choice is a coin flip the agent will eventually lose
5	Exact parameter types	A single type mismatch creates silent, invisible failures
6	Descriptions for what, prompts for when	Different reasoning stages need different information

These rules aren't theoretical. They come from production deployments where the agent is the product — where a dropped call or a wrong answer isn't a log entry, it's a lost customer.

Get them right, and your agent won't just have access to your ontology. It'll actually know how to use it.

We build production-grade AI applications on Palantir Foundry — in weeks, not quarters. Zero compromise on quality.

Follow 10x Partners for more Foundry deep-dives, or DM us to talk about your next build.