Models
Setup and run models offline on your machine
Ollama
ollama
is one of the easiest ways to run open-source large language models offline on your machine. It provides a CLI and REST API to manage and run models.
Setup
To get started, follow the official installation instructions to download it onto your machine.
Next, run the server by executing the following command in a terminal:
ollama serve
This starts a server that listens on port 11434
by default. The REST API can be accessed on this port.
Then, open another terminal and pull the model you wish to run. A list of models can be found in the library. For example, the following command pulls the Gemma 3 (27B parameter) model.
ollama pull gemma3:27b
Usage
To run the model, use the ollama run
command. For example, the following command runs the Gemma 3 (27B parameter) model:
ollama run gemma3:27b
The first response might take some time while the model is loaded into memory. The model is unloaded when idle or not in use.
See the official documentation for more commands and info.