- An agent is a application code that is capable of planning and reasoning, and has the ability to interact with the environment
- Planning and Reasoning - Given a task, using the available AI model (Which is an LLM in most of the cases), agent thinks through and plans out a set of Actions that needs to be taken
- Ability to interact with the environment - After planning out a set of actions, the agent now tries to complete these actions using the tools available
An Agent has two main components:
- Mind
- It has the ability to think through an AI model and plans out a set of actions
- Body
- Given a set of actions, the tools that can be used to achieve these actions form the body of an AI Agent
What are LLMs ( Large Language Model)?
LLM is a deep learning model used to understand and generate human understandable data. LLMs are predominantly built using transformers and Transformers can be classified into the following three:
- Encoders - A kind of transformer that can take in text and convert it into context heavy embeddings
- Decoders - A kind of transformer that can generate the next text based on the previous text sequence data
- Seq2Seq - A kind of transformer that takes in text and converts it into embedding. And generates the next word in the sequence
Attention mechanism - Used to understand the important part of the text data
Special Tokens
- To help generate precise and contextual data
- For example, EOS - denotes End of Sentence. And is used as an indicator for the transformer to stop generating text
Chat templates
- Used to convert conversations to LLM understandable format - used to structure conversations b/w language models and users
- It helps maintain context by preserving conversation history and hence leads to more coherent multi-turn conversations
Tools A tool is a piece of function given to the LLM to achieve a specific action. A tool should contain the following:
- A textual description of what the function does.
- A Callable (something to perform an action).
- Arguments with typings.
- (Optional) Outputs with typings.
How do tools work? The LLM will generate text in form of code to invoke that tool → Agent parses the output given out by the LLM → Recognize that there’s a tool call → Invoke the tool Tools are invoked by LLMs by generating a specific kind of text. It might appear like it is the agent that has invocated the tool. But it was actually an LLM.