Teaching Machines to Reason

Recently I attended the AI Tinkerer event in Singapore. I realized a-lot of people were building their own agents. Now, very naturally this gave me FOMO. So I spent alot of time trying to make an AI Coding Agent from scratch on my own. I named it Heph[AI]stus based on the greek god of forge.

No LangChain, No LangGraph, No Hugging Face, only me and some python.

This blog covers how I went about building this and teaches the reader how they can make their own AI agents.

Do checkout my code on github

ReAct: Reasoning and Acting Framework

ReAct is a technique first introduced in the paper “ReACT: Synergizing Reasoning and Acting in Language Models”. It is an ML paradigm to integrate reasoning and action-taking capabilities of LLMs into an AI agent. This technique uses a LLM as a “brain” to plan certain steps that fulfill the goal of the user and execute them.

The ReAct framework is quite similar to how humans solve a hard problem, it tries to simulate the inner monologue as a step-by-step sequence of tasks to solve a complex problem systematically.

An Analogy

Suppose your room is all trash as of now so your mom has given you the hard and tedious task of cleaning out your room and organizing it. So in this scenario your thinking can be something like

Okay, bed first make sure it looks neat.
Now, I need to scoop up all the trash and toss it in the garbage bin before clearing it all out.
Next, I fold my cloths and seperate out anything dirty into the laundry basket.
After this, I should wipe down all the surfaces using wet wipes.
Arrange my closet to ensure all clothes are sorted properly.
Going back to the floor, I need to vaccume it entirely.
Wipe the mirros, switches, door handles.
Grab the trash, throw it all out and start the laundry.

In a similar method, the ReAct Method uses prompt engineering to structure a LLMs output into an alternating pattern of thought, action and observation.

Three main ingredients

We have our LLM that we prompt to decompose a large task into a series of steps (subtasks).
Along with that, we provide LLMs a set of tools that it can use. These are usually written in the from of functions and can do anything (scrape the web, write a file, perform git commits etc)
Each of the tools we write returns a value, this is called an observation. Using the observation, the LLM re-evaluates its progress and then decides weather to end the action or to tell the user what the next-thought will be.

How do we link together these steps?

This is the core of a ReAct agent, we link these 3 steps inside what we call a ReAct loop.

Firstly, we craft a prompt to instruct our LLMs to output the very next step that needs to be taken in order to fulfill a task. For example, suppose we want the LLM to write some documentation for a file called hello.py then the immediate first step can be to open that file and read its contents. Generally, we craft the prompt such that the LLM is forced to output JSON format, this allows us to read each and every part of the output easily.

Next, the LLM outputs an action which is a tool call to our function “read_file” so the json object looks somewhat like this

{
  "thought": "Target file: docs.md (to be created); need to read HephAIstos.py first to extract its docstrings and structure for the documentation.",
  "action": {
    "tool": "read_file",
    "args": {
      "path": "HephAIstos.py"
    },
    "reason": "Read the source file to gather content for the documentation."
  }
}

So now within our tool set we have a mapping that allows to match the tool to the function and execute the given function using the arguments provided.

Within my code, I just execute the read_file function the HephAIstos.py file and then I provide the output of this function (contents within the HephAIstos.py file) to the model again.

Using the contents, it produces a next step that could be “Generate the docs” and the next step could be “Write a file with the following docs”.

Now that you understand how a ReAct loop works, lets do a deep dive of my code to show how I applied to to create Heph[AI]Stus.

Creating an abstraction over tools

The first question I asked myself was that how could I define a set of tools for the agent, and not only that but I had to assign a set format for the input and output of the tools alongside of the description. This was because I also need to inject this information within the system prompt itself so that the LLM knows that these tools are available.

@dataclass
class ToolContext:
	workspace_path: str = field(default_factory=lambda: os.getcwd())

@dataclass
class ToolResult:
	ok: bool
	output: str

ToolFn = Callable[[dict, ToolContext], ToolResult]

@dataclass
class Tool:
	name: str
	description: str
	fn: ToolFn
	context:
	ToolContext = field(default_factory=ToolContext)

These are the three main classes that help me standardize the format for each tool and ensure uniformity. Firstly, we have ToolContext that just contains some metadata like the workspace path. The ToolResult class is used to define how each result from every tool call should look like, it has one attribute ok that signifies weather the tool call was successful or not and an output in the form of a string.

I have also defined a ToolFn which is just a Callable function with 2 arguments, a dict and ToolContext and it returns a Type ToolResult.

Finally a Tool has a name, description, the tool function and context.

def _tool_write_file(args: dict, context: ToolContext) -> ToolResult:
	file_path = args.get("path")
	content = args.get("content")
	if not file_path or content is None:
		return ToolResult(ok=False, output="File path or content not provided.")
	full_path = safe_path(context.workspace_path, file_path)
	try:
		with open(full_path, 'w') as file:
			file.write(content)
			return ToolResult(ok=True, output=f"File written successfully to {full_path}")
		except Exception as e:
			return ToolResult(ok=False, output=f"Error writing file: {e}")

Here is the basic layout of a tool that writes a file provided a file_path and the contents within the args dictionary. You can observe that this is pretty much a normal write_file function only difference being that it returns a different Type of object called a ToolResult such that the input/output method for each tool can be standardized across different tools.

Context Matters

For any agent or LLM based application, context matters a lot. The LLM is only able to come up with a good response if it is able to grasp or be conditioned on the context. Similarly, handling context is a necessary step towards building a good AI-Agent.

@dataclass
class AgentState:
    last_modified_file: Optional[str] = None
    recently_created_files: List[str] = field(default_factory=list)
    current_files: Dict[str, str] = field(default_factory=dict)
    session_context: str = ""
    workspace_context: str = analyze_workspace(workspace_path = "./")

    last_topic: Optional[str] = None
    last_answer: Optional[str] = None


    def update_from_tool_result(self, tool_name: str, args: dict, result: ToolResult):
        """Update state based on tool execution"""
        if tool_name in ["write_file", "patch_file", "append_file"] and result.ok:
            file_path = args.get("path")
            if file_path:
                self.last_modified_file = file_path
                self.current_files[file_path] = tool_name

        elif tool_name == "write_file" and result.ok:
            file_path = args.get("path")
            if file_path and file_path not in self.current_files:
                self.recently_created_files.append(file_path)

    def get_context_string(self) -> str:
        context_parts = []

        if self.workspace_context:
            context_parts.append(f"WORKSPACE CONTEXT:\n{_clip(self.workspace_context, 1000)}")

        if self.last_modified_file:
            context_parts.append(f"LAST MODIFIED FILE: {self.last_modified_file}")

        if self.recently_created_files:
            context_parts.append(f"RECENTLY CREATED: {', '.join(self.recently_created_files[-3:])}")

        if self.current_files:
            active_files = list(self.current_files.keys())[-3:]
            context_parts.append(f"ACTIVE FILES: {', '.join(active_files)}")
        if self.last_topic:
            context_parts.append(f"LAST TOPIC: {self.last_topic}")

        return "\n".join(context_parts) if context_parts else "No recent file operations"

I made a separate class that is able to update the context as the agent is running on the go and able to feed our agent the context it requires. The key pieces of context I have maintained are last_modified_file , recently_created_files, current_files, workspace_context which is summary of all the files and the file structure. These are mainly used for under-specified queries by the user. Suppose a user first asks the model to write a fizz-buzz and our output is a bit wrong/buggy, then the user types “it is incorrect” so now how will our model know what is “it”? Simply by using this context and the chat history!

ReAct Loop

def react_loop(goal, agent: Agent, tool_registry: ToolRegistry, agent_state: AgentState, max_steps: int = 10):
    observation = None
    for step in range(max_steps):
        prompt = goal if step == 0 else f"Observation: {observation.output if observation else ''}"
        response = agent(prompt)
        if "action" in response:
            print("Taking some action...")
            # Some action to take
            tool_name = response["action"].get("tool")
            tool = tool_registry.get_tool(tool_name).fn
            args = response["action"].get("args", {})
            reason = response["action"].get("reason", "")
            print(response)
            print(f"Using the tool {tool_name} with args {args} so that I can {reason}")
            tool_result = tool(args, tool_registry.get_context())
            agent_state.update_from_tool_result(tool_name, args, tool_result)
            observation = tool_result

        elif "final" in response:
            print("Final answer reached.")
            return response["final"]["message"]
    return agent.messages[-1]["content"]

This loop is the heart of our AI agent. It’s what gives the agent its intelligence and independence.

Let’s go through what happens at each iteration:

Every time we make a request to the agent, it responds in JSON format. In this point of the flow, the agent has an alternative: it may continue with either the loop-respond or end.

If the agent responds, then there are two possibilities:

The output can just be a text out, or Anything shown that the agent might want to use a tool for,

Upon the agent’s decision to use the tool, we extract tool_name and tool_args, and also fetch the corresponding function object tool.fn from the registry. For the sake of interpretability, the agent also provides a short reason explaining why it wants to use that tool.

Then, we execute the tool with the given arguments and the shared context. The result of this tool execution is called an observation. Using this observation, we update the internal state of the agent and stash the latest result so that on the next iteration, it could be used.

Finally, if the agent’s response does contain a “final” field that’s our signal that we’ve reached the final answer. In which case the loop ends and we just return the agent’s final message.

In other words, this is the loop that gives the agent its reasoning-action-reflection cycle; it is actually how he thinks, acts, observes, and decides when to stop.

Final Thoughts

Building Heph[AI]stus from scratch was honestly one of the most satisfying rabbit holes I’ve gone down. It taught me that behind all the fancy frameworks, an AI agent is really just a loop of thinking, doing, and reflecting. If you’re thinking of building your own agent, start simple. Write a few tools, craft a prompt, and let your model figure things out step by step. You’ll learn way more than you expect.

Lessons from Google Gemini Symposim