Ollama on Google Colab

Running Large Language Models on Google Colab with Ollama

—— WRITTEN BY CHAT GPT ——— except the code ofcourse guys!!

Introduction

Google Colab is a cloud-based Jupyter notebook environment that allows users to run Python code for free with access to GPUs and TPUs. Ollama, an open-source framework, enables developers to run and experiment with large language models (LLMs) efficiently. This guide will walk you through setting up Ollama on Google Colab to host and run LLMs.

Understanding Google Colab

Google Colab offers features like free GPU access, easy collaboration, and pre-installed libraries. However, it has limitations such as session timeouts and restricted storage. It is best suited for prototyping and testing models rather than long-running applications.

Introduction to Ollama

Ollama is a framework for running LLMs efficiently, designed with simplicity and performance in mind. It supports various open-source models and is optimized for smooth deployment and testing. Being open-source, it allows extensive customization and integration into different projects.

Setting Up the Environment

Before using Google Colab with Ollama, ensure you have a Google account and access to Colab. You will need to install required dependencies and configure Colab’s runtime for optimal performance.

Steps:

  1. Login to Google cloab and create a new notebook.

  2. Change runtime to GPU T4

  3. Create a new code cell and input the below and run it

     !curl https://ollama.ai/install.sh | sh
    
  4. Add a new cell and add the below code and run it

     !pip install aiohttp pyngrok
    
     import os
     import asyncio
     from google.colab import userdata
    
     # Set LD_LIBRARY_PATH so the system NVIDIA library
     os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})
    
     async def run_process(cmd):
       print('>>> starting', *cmd)
       p = await asyncio.subprocess.create_subprocess_exec(
           *cmd,
           stdout=asyncio.subprocess.PIPE,
           stderr=asyncio.subprocess.PIPE,
       )
    
       async def pipe(lines):
         async for line in lines:
           print(line.strip().decode('utf-8'))
    
       await asyncio.gather(
           pipe(p.stdout),
           pipe(p.stderr),
       )
    
     #register an account at ngrok.com and create an authtoken and place it here
     await asyncio.gather(
         run_process(['ngrok', 'config', 'add-authtoken',"NGROCK TOKEN HERE"])
     )
    
     await asyncio.gather(
         run_process(['ollama', 'serve']),
         # Pull all the agents you want with a new run_process for each agent
         run_process(['ollama','pull','mistral']),
         run_process(['ngrok', 'http', '--log', 'stderr', '11434', '--host-header', 'localhost:11434'])
     )
    
    1. Once you run it, you should see a https URL by ngrock you can then use to call the agent through local ollama on the machine .

    2. Open new terminal and paste the URL like this export OLLAMA_HOST=”<URL>” . Hit enter

    3. Type ollama run mistral

    4. ENJOY!!!

Since Colab has limited storage, you may need external sources like Google Drive to store large models. Also you cannot run the model on gpu forever due to usage restrctions on colab however you can stil use the cpu.

Steps:

  1. Download LLMs to Colab or Google Drive.

  2. Configure model settings for efficiency.

  3. Adjust runtime settings for better performance.

Using Ollama for Development

Ollama can be used for various development purposes, including:

  • Experimenting with different LLMs.

  • Integrating models into applications.

  • Testing model responses and fine-tuning parameters.

Advantages and Challenges

Advantages:

  • Free access to GPUs.

  • Quick experimentation without local hardware constraints.

  • Easy integration into existing projects.

Challenges:

  • Limited runtime duration.

  • Storage constraints.

  • Potential latency issues.

Solutions:

  • Use Google Drive for model storage.

  • Optimize model performance by adjusting runtime settings.

  • Periodically refresh Colab sessions to avoid disconnections.

Conclusion

Using Ollama on Google Colab is a powerful way to experiment with LLMs without requiring high-end local hardware. While there are limitations, strategic setup and optimization can help overcome these challenges. Explore further, test different models, and enhance your AI-driven applications!

Did you find this article valuable?

Support Atharva Deshpande by becoming a sponsor. Any amount is appreciated!