Kimi K2.6 / K2.7 Code Understanding
Kimi is Moonshot AI's family of
natively multimodal models. OpenTryOn integrates two variants via a single
adapter, KimiUnderstandAdapter, for image and video understanding:
- Kimi K2.6 (
kimi-k2.6): General-purpose multimodal model with optional "thinking" mode and a 256K context window. - Kimi K2.7 Code (
kimi-k2.7-code/kimi-k2.7-code-highspeed): Coding-focused variant built on K2.6 with the same vision/video understanding, tuned for long-horizon agentic coding and tool use. Thinking mode is always on for this variant.
Unlike most other adapters in this repo, Kimi's understanding capability is general-purpose -- useful for describing garments, outfits, and lookbook/runway videos in the fashion domain, but equally capable on documents, UI screenshots, product photography, or any other visual content.
For local/GPU deployment instead of the hosted API, see the open-weight Kimi-VL local model.
Prerequisites
- Moonshot AI Account: Sign up at platform.kimi.ai
- API Key: Get one from the API Keys console
- Environment Variable: Set
MOONSHOT_API_KEYin your.envfile
MOONSHOT_API_KEY=your_moonshot_api_key
Installation
The Kimi adapter reuses the openai package (already a core dependency of
opentryon, since the Kimi API is fully OpenAI-SDK compatible). No
additional installation is required.
Quick Start
Image Understanding
from tryon.api import KimiUnderstandAdapter
adapter = KimiUnderstandAdapter() # uses kimi-k2.6 by default
result = adapter.understand_image(
"garment.jpg",
prompt="Describe this outfit: color, pattern, style, fit, and material.",
)
print(result["text"])
Video Understanding
result = adapter.understand_video(
"runway_clip.mp4",
prompt="Summarize the styling and garments shown in this video.",
)
print(result["text"])
Kimi K2.7 Code (coding-focused, still multimodal)
adapter = KimiUnderstandAdapter(model="kimi-k2.7-code")
result = adapter.understand_image(
"ui_mockup.png",
prompt="Write the HTML/CSS for this design.",
)
print(result["text"])
API Reference
KimiUnderstandAdapter
__init__(api_key=None, model="kimi-k2.6", base_url=None)
Parameters:
api_key(str, optional): Moonshot API key. Defaults toMOONSHOT_API_KEYenvironment variable.model(str, optional): Default model for calls that don't override it. One ofkimi-k2.6,kimi-k2.7-code,kimi-k2.7-code-highspeed,kimi-k2.5. Default:"kimi-k2.6".base_url(str, optional): Defaults toKIMI_BASE_URLenv var orhttps://api.moonshot.ai/v1.
Raises:
ValueError: If the API key is missing.ImportError: If theopenaipackage isn't installed.
understand_image(image, prompt=..., model=None, thinking=None, max_tokens=None)
Understand one or more images.
Parameters:
image: A single image or list of images. Each may be a file path, URL,PIL.Image, raw bytes, orBytesIO. Supported formats: png, jpeg, webp, gif.prompt(str): Question/instruction about the image(s).model(str, optional): Override the default model for this call.thinking(bool, optional): Force-enable/disable thinking mode. Onlykimi-k2.6supports disabling it;kimi-k2.7-code*always thinks.max_tokens(int, optional): Max output tokens (server default: 32768).
Returns: dict with keys text, reasoning, model, usage.
understand_video(video, prompt=..., model=None, thinking=None, max_tokens=None, use_file_upload=None, max_inline_mb=20.0)
Understand video content.
Parameters:
video: File path, URL, raw bytes, orBytesIO. Supported formats: mp4, mpeg, mov, avi, x-flv, mpg, webm, wmv, 3gpp.use_file_upload(bool, optional): Upload to Moonshot storage (ms://reference) instead of inlining as base64. Defaults to auto-enabled abovemax_inline_mb.- Other parameters same as
understand_image.
Returns: dict with keys text, reasoning, model, usage.
understand(image=None, video=None, prompt=..., model=None, thinking=None, max_tokens=None)
Single entry point that accepts image and/or video (at least one
required) -- this is what the opentryon understand --model kimi-k2.6
CLI command calls.
chat(messages, model=None, thinking=None, max_tokens=None, tools=None, tool_choice=None)
Escape hatch for full multi-turn conversations or tool-calling agents (e.g.
Kimi's "watch a video clip" tool-use pattern). Returns the raw OpenAI SDK
ChatCompletion object.
Parameter Notes
Kimi's k2.5/k2.6/k2.7-code models fix temperature, top_p, n,
presence_penalty, and frequency_penalty server-side -- non-default
values raise an API error, so this adapter doesn't expose those knobs. Only
thinking and max_tokens are configurable.
| Field | Behavior |
|---|---|
thinking | Default enabled. kimi-k2.7-code* cannot disable it. |
max_tokens | Default 32768. |
temperature / top_p / n / presence_penalty / frequency_penalty | Fixed by the API; not exposed. |
Using the opentryon CLI
# Kimi K2.6 -- image understanding
opentryon understand --model kimi-k2.6 --image garment.jpg \
--prompt "Describe this outfit."
# Kimi K2.6 -- video understanding, disable thinking mode
opentryon understand --model kimi-k2.6 --video runway_clip.mp4 --no-thinking
# Kimi K2.7 Code -- coding-focused multimodal understanding
opentryon understand --model kimi-k2.7-code --image ui_mockup.png \
--prompt "Write the HTML/CSS for this design."
# High-speed K2.7 Code variant
opentryon understand --model kimi-k2.7-code --kimi-model kimi-k2.7-code-highspeed \
--image garment.jpg
Results are printed to stdout and saved as JSON under outputs/.
Error Handling
The adapter raises ValueError for:
- Missing API key
- Neither
imagenorvideoprovided tounderstand() - Invalid
model - Attempting to disable
thinkingonkimi-k2.7-code*