Models
Setup and run function calling models offline
Ollama
ollama
is one of the easiest ways to run open-source large language models offline on your machine. It provides a CLI and REST API to manage and run models.
Setup
To get started, follow the official installation instructions to download it onto your machine.
Next, run the server by executing the following command in a terminal:
ollama serve
This starts a server that listens on port 11434
by default. The REST API can be accessed on this port.
Then, open another terminal and pull the model you wish to run. A list of models can be found in the library. For example, the following command pulls the Gemma 3 (27B parameter) model.
ollama pull gemma3:27b
Ollama does not support function calling with Gemma 3 officially yet. The tutorials take that into account and provide code snippets to manually parse the function calls. To use function calling via the Offline Function Calling CLI, or the Ollama API, you can pull the function calling enabled version of the models from here instead:
ollama pull gamemaker1/gemma3:27b-fc # or 12b-fc
The files used to create these function calling enabled models can be found here.
Note that it is recommended to use the 27b parameter model only if you have 20-24 GB of RAM or more.
Usage
To run the model, use the ollama run
command. For example, the following command runs the function calling enabled Gemma 3 (27B parameter) model:
ollama run gamemaker1/gemma3:27b-fc
The first response might take some time while the model is loaded into memory. The model is unloaded when idle or not in use.
See the official documentation for more commands and info.