DevOps na Era da IA
Ciclo de Vida do Modelo
Conversões de formato de arquivo de modelo (Opcional)
Trabalhos Anteriores
Modelos de Código Aberto
Ollama
brew install ollama
ollama pull llama3.2
ollama serve
LlamaCpp
brew install llamacpp
llama-server --hf-repo hugging-quants/Llama-3.2-1B-Instruct-Q8_0-GGUF --hf-file llama-3.2-1b-instruct-q8_0.gguf -c 2048
Ollama no Docker
FROM ollama/ollama:0.3.12
# Listen on all interfaces, port 8080
ENV OLLAMA_HOST 0.0.0.0:8080
# Store model weight files in /models
ENV OLLAMA_MODELS /models
# Reduce logging verbosity
ENV OLLAMA_DEBUG false
# Never unload model weights from the GPU
ENV OLLAMA_KEEP_ALIVE -1
# Store the model weights in the container image
ENV MODEL gemma2:9b
RUN ollama serve & sleep 5 && ollama pull $MODEL
# Start Ollama
ENTRYPOINT ["ollama", "serve"]
Variáveis suportadas:
-
MODEL(variável de build) -
OLLAMA_HOST(variável de tempo de execução) -
OLLAMA_NUM_PARALLEL(variável de tempo de execução)
LlamaCpp no Docker
FROM ghcr.io/ggerganov/llama.cpp:server
# Create directories for the server and models
RUN mkdir -p /app/models
# Download model file into /app/models
EXPOSE 8080
# Command to run the server when the container starts
ENTRYPOINT ["llama-server", "-m", "/app/models/llama-3.2-1b-instruct-q8_0.gguf", "-c", "2048"]
Documentação do LlamaCpp Docker
Vamos portar para Dagger e Publicar no Google Cloud Registry
Dagger
brew install dagger
Example:
dagger call --interactive function-name --project-path=./path-to-project-in-repo \
--src-dir=https://user:$GITHUB_TOKEN@github.com/user/reponame#branchname --image-name="gcr.io/organization/project/image-name"
Implantar Aplicativo de UI
npm run build
cd client
fly launch
