AI API

e host a variety of AI ~~agents~~services ~~now~~here ~~have~~for ~~progressed~~everyone to ause ~~level~~- ~~where~~all ~~they~~data stays within HKI and you can beuse ~~quite~~as ~~useful~~many totokens doas ~~even~~you ~~larger structural tasks on whole projects. This is often done using specialized IDE (for example Cursor) or plugins (JetBrains AI,~~ ~~Roo Code). There are even terminal-based tools, for example~~ ~~aider~~.want.

~~While we have ChatGPT, I was not able to get an OpenAI API key to run (it always says we are rate-limited). But fortunately we can host very capable AI agents on our~~ ~~servers~~.

~~The models are hosted via vLLM. The best results can be achieved using~~ ~~Roo Code~~.

Requirements

You will need an account on the LiteLLM server https://asb.hki-jena.de/lllm/ (Send a mail to ruman.gerst@leibniz-hki.de and request an account)

You will get an invitation link where you then can create your password.

Use the shared key

Please use this key only if you are accessing the API from within HKI: sk-A8D04IxBdQIXz_USnxvXug

Create your own key

To access the admin panel visit https://asb.hki-jena.de/lllm/ui
Go to the Virtual Keys section
Click the Create New Key button
Set the Team to ASB and select that it should use All Team Models
Set the key type to API, otherwise the key can control account settings

Remember the key or copy it into notepad - you will need it to access the API

Usage with Roo Code (best result so far!)

~~Roo Code~~ ~~is a plugin for Visual Studio Code which adds very powerful AI agent capabilities into the software. Install it using the Extension panel on the left side.~~

~~During the initial setup, put in the following values:~~

~~API Provider: OpenAI Compatible~~

~~Base URL:~~ ~~https://asb.hki-jena.de/lllm/~~

~~API Key: The API key you generated~~

~~Model: Choose the one you want (see below)~~

~~Image Support -> Unchecked~~

~~Computer User -> Unchecked~~

~~Prompt Caching -> Unchecked~~

~~Context window size -> According to the model (2048, 32768,~~ ~~128000~~)

~~Model~~	~~Performance~~	~~Notes~~
~~GLM-4.5~~	~~Works mostly~~	~~Only 2 GPUs can run this one!~~ ~~Sometimes gets stuck and then the service times out~~
~~Qwen3-Coder~~	~~Seems to work (need more tests?)~~	~~Runs on all servers~~
~~GPT-OSS~~	~~Fails~~	~~Does not understand the required grammar of Cline/RooCode~~

~~Roo Code has multiple modes:~~

~~If you want to do simple edits, go with the~~ ~~Code~~ ~~mode. For larger changes that may require multiple steps, use the~~ ~~Orchestrator~~ ~~mode.~~

~~By default Roo Code will ask to approve any action that it will take. Use the button above the chat to selectively turn on auto-approve (careful with commands! see~~ ~~here~~)

AGENTS.md

An ~~AGENTS.md~~ ~~file can be created at the root of the project(s) to act as guidance for AI agents that should work with the code. You can generate it with RooCode by running in the chat window the following command:~~

/init

Examples

~~Task~~

~~This project should implement Richardson-Lucy deconvolution using CPU and CUDA. Check if the implementation is mathematically correct~~

~~The csv directory contains multiple tables. Each table has a column "Dist". Generate a Python script that produces an interactive histogram using plotly and save that into the current directory.~~

I implemented a new enhanced API for my JIPipeValidityReport. Before that you had to do report.add(new JIPipeValidityReportEntry(new NodeContext(context, this), "Title", "Description)). Now I can do context.node(this).title("Title").description("Description").report(report)

~~Go through each single Java file and adapt the code to my new API.~~

Usage with Jetbrains AI

~~Currently broken because JetBrains refuses to work on that feature properly:~~

~~https://youtrack.jetbrains.com/issue/LLM-15339/Third-party-providers-use-OpenAI-API-for-all~~

~~(It's not that great TBH)~~

~~Update: there is a plugin that integrates RooCode into Jetbrains IDE:~~ ~~https://github.com/wecode-ai/RunVSAgent~~

Usage with aider (CLI tool)

~~Install aider using the oneliner:~~

curl -LsSf https://aider.chat/install.sh | sh

~~Then create in your home directory two files:~~

~~~/.aider.conf.yml~~

openai-api-key: <YOUR KEY>
openai-api-base: https://asb.hki-jena.de/lllm/
model: openai/vllm/glm-4.5-air-128k
weak-model: openai/vllm/gpt-oss-120b-128k
chat-language: English

~~~/.aider.model.metadata.json~~

{
    "openai/vllm/glm-4.5-air-128k": {
        "max_tokens": 128000,
        "max_input_tokens": 32000,
        "max_output_tokens": 4096,
        "input_cost_per_token": 0,
        "output_cost_per_token": 0,
        "litellm_provider": "openai",
        "mode": "chat",
        "reasoning_tag": "think",
        "remove_reasoning": true
    },
     "openai/vllm/qwen3-coder-64k": {
        "max_tokens": 64000,
        "max_input_tokens": 32000,
        "max_output_tokens": 4096,
        "input_cost_per_token": 0,
        "output_cost_per_token": 0,
        "litellm_provider": "openai",
        "mode": "chat",
        "reasoning_tag": "think",
        "remove_reasoning": true
    },
     "openai/vllm/gpt-oss-120b-128k": {
        "max_tokens": 128000,
        "max_input_tokens": 32000,
        "max_output_tokens": 4096,
        "input_cost_per_token": 0,
        "output_cost_per_token": 0,
        "litellm_provider": "openai",
        "mode": "chat",
        "reasoning_tag": "think",
        "remove_reasoning": true
    }
}

~~Then you can start~~ aider ~~in any directory (just read their documentation).~~

Host models on your PC

~~Please note that running AI models requires a strong PC; ideally with a GPU. Otherwise ollama/lmstudio will keep everything in RAM and run everything through CPU computing which is very slow.~~

Ollama

~~Usage:~~

# Pull a model

ollama pull gemma3

# Run a model (will pull if model does not exist)

ollama run gemma3

~~The default context size of ollama is 2048 tokens, which is quite small. You'll have to create a new model with a changed context size:~~

ollama run gemma3

/set parameter num_ctx 32768
/save gemma3-32k
/exit

# Now you can run the new model

ollama run gemma3-32k

Lmstudio

~~Lmstudio is a graphical interface that allows you to download and run local chat bots. It also has an optional~~ ~~server component~~ ~~that can be enabled, so other tools can make use of its backend.~~

~~On Linux it is distributed as AppImage. I highly recommend to install~~ ~~Gear Lever, which helps with installing AppImages (Open the AppImage with Gear Lever). Lmstudio also works very well on Windows.~~

~~If your computer does not have enough VRAM for a model, you can always let it be (partially) run on CPU:~~