I hate giving money to Dario, Sam, or Elon.

Back
Home

I hate giving money to Dario, Sam, or Elon.

Yeah, I'm that cheap.

I've always liked to tinker with build and CI tools, or various databases that I've used at work in my home lab. I've maintained a lab in my home since the 90's. At one point, a spare bedroom had 12 machines. There were a couple of hubs in there so I could teach myself routing, one box ran diald so machines could reach the internet, suffice it to say, I took my professional nerdery to the extreme. When AI "happened", it was no different.

Version 1 was a modest 2-container solution, using OpenWebUI and Ollama. It mostly did what was advertised, but as my dedicated host wasn't equipped with a particularly well endowed GPU, it was painfully slow. My wife had expressed an interest in using AI chat for her real estate work, so this morphed into OpenWebUI talking directly to OpenAI. Around the same time, my work wanted a similar chat system for general office use. Our head of IT wanted LibreChat wired up to Amazon Bedrock. That sounded like fun, so I made it happen. As this went to final review, someone in the chain didn't like how LibreChat needed an OpenAI connection for RAG, so we ditched it in favor of OpenWebUI. I had to build a Bedrock Access Gateway for these tools to play nice, but it worked, passed muster, and was pushed to production.

So back to the local AI... I'm running a pair of containers, no agents, while watching the docker host's GPU heat up to 80C, and getting very little done. Then it hits me; I have an M2 Mac Studio in the basement that I use to record bad punk rock songs. This box has a 30 core GPU. I find out that you can `brew install` Ollama on it. Local chat operations went from a couple of minutes, down to a couple of seconds. For my money, this thing was running as fast as Claude.

After the charm of much faster chat sessions wore off, I began to notice how iterative sessions led to things falling out of context. Hallucinations got weird. The session would go so far off the rails that you'd have to restart it to make any sense of it. My colleagues were starting to use agents and were getting into spec and behavior driven development. Since I'm too much of a command line bigot to use vscode or pycharm and my "laptop" is actually an aging Chromebook, I opted for something that would run in a terminal and talk to Ollama. Enter opencode. Now I have something I can put to work on a problem while I switch tasks or make a cup of coffee or even go take a walk while I think about what kind of mischief I want to get up to next. Opencode + Ollama isn't perfect, but I'd say it stacks up nicely against commercial offerings. Best of all, I don't have to give Elon a single cent. 😀