Why?

Building an AI Coding Agent for fun and profit

Why?

I’ve recently started using OpenCode ,It’s been a great experience for the most part, except for when I start a new chat, send the first message and see 7-8% of my context used up. This lead to me going down the Context Engineering rabbit hole, I experimented with the caveman skill that’s gotten popular recently, and while it did somewhat help. It feels like something that should be built into the system (or something similar to it).

That got me thinking about building my own agent harness, where I could modify the system prompt, add only tools that I require, and thus Apollo was born.

What is an Agent Harness?

An agent harness is simply an environment that provides an LLM with tools, a system prompt and a way for the user to interact with it. The LLM can then use those tools on the user’s system to accomplish tasks.

While an agent harness may seem complex with all the tool calls and stuff it does, most of that complexity falls on the LLM and not the harness. The harness simply gives the LLM the tools and the system prompt, and the LLM figures out how to use them.

Building Apollo

Armed with my OpenCode Go subscription, I set out to build a simple agent harness, my main goals with this were:

A simple system prompt hopefully no more than 1000 tokens (for context, Claude Code’s system prompt is ~10,000 tokens while OpenCode’s is ~7,000 tokens)
Only a few tools, the most important ones I could think of were:
- bash: to run bash commands on the user’s system
- ls: to list files in a directory
- read: to read the contents of a file
- edit: to edit the contents of a file

An agent harness is essentially a loop with the following steps:

1. Initialize message history with system prompt
2. Get user input
3. Append to message history
4. Send to LLM
5. LLM responds with tool calls and/or a final response
6. If tool calls,
    execute them and append results to message history
7. If final response, output to user, else execute remaining tool calls
8. Go back to step 2

The System Prompt

I used Pi’s system prompt as reference and came up with this

var SYSTEM_PROMPT = `You are an AI assistant that helps the user understand and navigate the codebase in the current working directory. You have access to the following tools:

- ls [path]: Lists the contents of a directory. Use this to explore the project structure, find files, or see what is in a folder. If no path is provided, it lists the current directory.
- read <path>: Reads the full contents of a file. Use this to examine source code, configuration files, or documentation. The path argument is required.
- bash [cmd]: Executes a shell command and returns the output. Use this for task that requires running commands, such as checking git status, running tests, or using command-line tools.
- edit path=<path> new_content=<content> [old_text=<text>]: Edit a file by providing new content. ALWAYS read the file first, then provide the COMPLETE new content. The user will see a diff preview and can confirm before changes are applied. Use old_text to specify what you're replacing if you want verification.

Guidelines for using tools:
- Do not output raw file contents you read directly unless the user explicitly asks for them. Instead, summarize, quote, or explain the relevant parts.
- Do not use markdown tables in your responses.
- Keep your responses concise and relevant to the user's request.
- When referencing files, include line numbers where possible, e.g. "src/index.ts:10-20" for lines 10 to 20 in src/index.ts.

Current working directory: ` + cwd + "\n\n"

Using tokencounter.org, this prompt comes out to be ~300 tokens, much less than OpenCode’s 7,000.

Styling the UI

This was honestly one of most fun parts, I learnt about ANSI codes and how modules like Rich (in Python), and Chalk (from JavaScript) work and color text in the terminal.

You basically prepend a string with an ANSI code, and append a reset code at the end. For example, to make text red, you can do:

echo -e "\033[31mThis text is red\033[0m"

Similar codes exist for other colors as well as styles (bold, italics etc.).

I set up variables for various colors and styles.

const (
	Reset = "\033[0m"
	Bold  = "\033[1m"
	Dim   = "\033[2m"

	Black   = "\033[30m"
	Red     = "\033[31m"
	Green   = "\033[32m"
	Yellow  = "\033[33m"
	Blue    = "\033[34m"
	Magenta = "\033[35m"
	Cyan    = "\033[36m"
	White   = "\033[97m"

	Gray = "\033[90m"
)

I also later used ANSI codes to move the cursor up so I could overwrite unrendered markdown responses with rendered ones.

The Agent Loop

I then set up the agent loop, which was pretty simple.

reqBody := Request{
    Model:    AppConfig.ModelName,
    Messages: messages,
}

jsonBody, err := json.Marshal(reqBody)
if err != nil {
    return "", "", nil, nil, err
}

if debugMode {
    fmt.Println(Dim + "Request: " + string(jsonBody) + Reset)
}

req, err := http.NewRequestWithContext(context.Background(), "POST",
    AppConfig.BaseURL, bytes.NewBuffer(jsonBody))

if err != nil {
    return "", "", nil, nil, err
}

req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", "Bearer "+apiKey)

return handleResponse(client, req)

This allows for a simple chat interface where the user can ask questions and the model can respond. However without tools, the model is pretty limited in its capabilities. We’ll now add the tools one by one and see how it improves the model’s performance.

Adding Tools

This was honestly the part where I struggled the most, OpenAI’s Documentation was honestly pretty helpful.

We append the following to the request body to add tools to the model’s capabilities:

"tools": [
{
    "type": "function",
        "function": {
            "name": "ls",
            "description": "List directory contents",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"}
                },
                "required": []
            }
        }
},

The response from the model will if it decides to call a tool is of the form:

{
    "choices": [
        {
            "message": {
                "tool_calls": [
                    {
                        "name": "ls",
                        "arguments": {
                            "path": "src/"
                        }
                    }
                ]
            }
        }
    ]
}

I then created a function to parse tool calls and one function to execute each tool call, each with their own hardening and edge case handling.

For ls, I made sure to not let the model access directories outside the current working directory.

func executeLS(args []string) (string, error) {
	path := "."
	if len(args) > 0 {
		path = strings.Join(args, " ")
	}

	if strings.Contains(path, "..") {
		return "", fmt.Errorf("path traversal not allowed")
	}

	allowedPaths := []string{".", "assets", ".."}
	allowed := false
	for _, allowedPath := range allowedPaths {
		if path == allowedPath || strings.HasPrefix(path, allowedPath+"/") {
			allowed = true
			break
		}
	}
	if !allowed {
		return "", fmt.Errorf("access to this path is not allowed")
	}

	cmd := exec.Command("ls", "-la", path)
	out, err := cmd.Output()
	if err != nil {
		if e, ok := err.(*exec.ExitError); ok {
			return "", fmt.Errorf("%s", e.Stderr)
		}
		return "", err
	}
	return string(out), nil
}

I similarly added the read, bash and edit tools, each with their own edge case handling and hardening.

Rendering Markdown

The agent could now reasonably respond and execute tasks on my machine, but the responses often contained markdown that “polluted” my screen, floating asterisks and backticks.

I looked around and found glamour, a Go library by the Charm team that can render markdown reasonably beautifully in the terminal.

The documentation was pretty straightforward and it was pretty easy to integrate into the project. I simply rendered the markdown response from the model before printing it to the user.

However the issue came when I implented streaming responses, since the output from the model arrived in chunks, which often had incomplete markdown formatting (for example: *hello in one chunk and world* in another chunk). Glamour could not render incomplete markdown.

The approach I came up with was to keep track of the total lines (say n) in the response and once the entire message stream was complete, move the cursor up by n lines (ANSI codes to the rescue) and re-render the entire message, overwriting the unrendered text.

fmt.Print("\r") // Go to start of current line
for i := 0; i < lineCount; i++ {
    fmt.Print("\033[A") // Move up one line
}
fmt.Print("\033[J") // Clear from cursor to end of screen

rendered, renderErr := renderer.Render(outputBuffer.String())
if renderErr != nil {
    rendered = outputBuffer.String()
}
fmt.Printf(Red+"Apollo: "+Reset+"%s"+"\n\n", rendered)

This mostly solves the issue, except for the fact that the \033[J ANSI code that moves up one line can only go up to the top of the current terminal viewport. So if the terminal has 30 lines and the response has 50 lines, it can only move up 30 lines, leaving the remaining 20 lines of unrendered markdown floating around. Though not visible to the user, unless he/she has scrollback enabled.

Quality of Life Improvements

One of the best things I added was GNU Readline support, which allowed for command history and basic line editing shortcuts without me having to explicitly implement them. I could now press the up arrow to see my previous command, or use Ctrl + A to go to the beginning of the line, Ctrl + E to go to the end of the line, and so on.

It was relatively easy, all I had to do was leplace bufio.Scanner() with rl, err := readline.NewEx(). It also came with an option for me to implement my own completers (that trigger when the user presses Tab). I implemented a simple Filename completer that would suggest files in the current working directory when the user pressed Tab.

func (c *filenameAutoCompleter) Do(line []rune, pos int) (newLine [][]rune, length int) {
	lineStr := string(line)

	start := pos
	for start > 0 && line[start-1] != ' ' {
		start--
	}

	prefix := lineStr[start:pos]

	dir := filepath.Dir(prefix)
	if dir == "" {
		dir = "."
	}
	base := filepath.Base(prefix)

	// Read directory entries
	entries, err := os.ReadDir(dir)
	if err != nil {
		return nil, 0
	}

	// Find matching entries
	for _, entry := range entries {
		name := entry.Name()
		if !strings.HasPrefix(name, base) {
			continue
		}

		// Return the suffix that completes the word
		suffix := name[len(base):]
		if entry.IsDir() {
			suffix += "/"
		}

		newLine = append(newLine, []rune(suffix))
	}

	// length=0 means we're inserting at cursor, not replacing
	return newLine, 0
}

PS: bash also uses GNU Readline under-the-hood.

Possible Future Improvements

Saving/Exporting Sessions: This is something I’m currently working on, allowing the user to save their chat sessions and export them as a Markdown or HTML file.
Model Picker: Allow the user to switch models mid-chat.

References

How to Build an Agent