DevOps in the Age of AI (Part Deux)

DevOps in the Age of AI (Part Deux)

DevOps in the Age of AI

Model Lifecycle

graph LR
    A[Model Training] --> B[Model Saving]
    B --> C[Model + API Packaging - Docker Container]
    C --> D[Serve via API]

Model file format convertions (Optional)

graph LR
    A(Tensorflow H5 Model) --> B(Convert to ONNX) --> C(ONNX Model)
    D(Pytorch PT Model) --> E(Convert to ONNX) --> F(ONNX Model)
    G(Python's Pickle Model) --> G(Python's Pickle Model)

Open Source Models

Ollama

brew install ollama

ollama pull llama3.2

ollama serve

LlamaCpp

brew install llamacpp

llama-server --hf-repo hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF --hf-file llama-3.2-1b-instruct-q8_0.gguf -c 2048

Ollama in Docker

FROM ollama/ollama:0.3.12

# Listen on all interfaces, port 8080
ENV OLLAMA_HOST 0.0.0.0:8080

# Store model weight files in /models
ENV OLLAMA_MODELS /models

# Reduce logging verbosity
ENV OLLAMA_DEBUG false

# Never unload model weights from the GPU
ENV OLLAMA_KEEP_ALIVE -1 

# Store the model weights in the container image
ENV MODEL gemma2:9b
RUN ollama serve & sleep 5 && ollama pull $MODEL 

# Start Ollama
ENTRYPOINT ["ollama", "serve"]

Supported variables:

  • `MODEL` (build variable)
  • `OLLAMA_HOST` (runtime variable)
  • `OLLAMA_NUM_PARALLEL` (runtime variable)

LlamaCpp in Docker

FROM ghcr.io/ggerganov/llama.cpp:server

# Create directories for the server and models
RUN mkdir -p /app/models

# Download model file into /app/models

EXPOSE 8080

# Command to run the server when the container starts
ENTRYPOINT ["llama-server", "-m", "/app/models/llama-3.2-1b-instruct-q8_0.gguf", "-c", "2048"]

LlamaCpp Docker Documentation

Let's port to Dagger and Publish to Google Cloud Registry

Dagger

brew install dagger

Example:
dagger call --interactive function-name --project-path=./path-to-project-in-repo \
   --src-dir=https://user:$GITHUB_TOKEN@github.com/user/reponame#branchname --image-name="gcr.io/organization/project/image-name"

Deploy UI App

npm run build

cd client

fly launch

Join us on Discord

The Ubuntu TechHive on Discord

COME HANGOUT!
Join us on Discord