Locally hosted AI agents
AI agents now have progressed to a level where they can be quite useful to do even larger structural tasks on whole projects. This is often done using specialized IDE (for example Cursor) or plugins (JetBrains AI, Roo Code). There are even terminal-based tools, for example aider.
While we have ChatGPT, I was not able to get an OpenAI API key to run (it always says we are rate-limited). But fortunately we can host very capable AI agents on our four GPU servers:
Server | Hardware | Models |
nodeASB03G nodeASB04G |
1x Nvidia H200 140GB VRAM + 1.48TB RAM |
GLM 4.5-Air ( Qwen3-Coder ( Gemma3 ( |
nodeASB01G nodeASB02G |
4x Nvidia A40 44GB VRAM + 1TB RAM |
Qwen3-Coder ( Gemma3 ( |
The models are hosted via ollama, which listens to instructions on port 11434. I tested both Jetbrains AI and Roo Code.
Requirements
Because IT does not allow direct connections to servers on all ports, you will need to setup a SSH Tunnel to proxy the ollama server running on the node to your local PC:
ssh -L 11435:localhost:11434 USER@SERVER
(I used port 11435 as local port in the case if you have a local ollama)
Info: Always use the 32k model as the main one. The context window of the other models is 2k, which is not enough for AI agent tasks.
Usage with Roo Code (best result so far!)
Roo Code is a plugin for Visual Studio Code which adds very powerful AI agent capabilities into the software. Install it using the Extension panel on the left side.
During the initial setup, put in the appropriate values. Ignore the warning that the model does not exist.
Roo Code has multiple modes:
If you want to do simple edits, go with the Code mode. For larger changes that may require multiple steps, use the Orchestrator mode.
Usage with Jetbrains AI
- Install the Jetbrains AI plugin and let it activate. You do not have to pay for it or start any trial, as we're going to use the local model option
- When the plugin is initialized, select the option that you will stay within the free plan
- Setup the model as following:
Please note that Jetbrains' ollama integration is not that good yet and you may have to click some generation button multiple times. I also experienced that the Edit mode is not that great and you are better off with using Roo Code for that.