Benchmarking Code Generation: A Look at Chinese and International LLMs

Abstract:With the rapid advancement of artificial intelligence technology, the application of large language models (LLMs) in software programming has become a focal point of shared interest among both industry and academia. This paper takes mainstream domestic and international LLMs as its research subjects and conducts a systematic comparative analysis of their programming capabilities from multiple dimensions, including performance on programming benchmarks, code-generation quality, engineering-practice competence, adaptability to Chinese programming scenarios, and ecosystem development. The study reveals that domestic LLMs—represented by DeepSeek, Qwen, and MiniMax—have fully matched or even surpassed, in certain areas, internationally recognized benchmark products in standardized programming benchmarks. In particular, from late 2025 to early 2026, MiniMax M2.5 and GLM-5 have entered the global top three on the SWE-bench Verified engineering-level benchmark, marking a historic breakthrough in the programming capabilities of domestic LLMs. However, in terms of systematizing agentic coding, building robust code-security frameworks, and fostering a mature developer ecosystem, international models still hold a relative advantage. This paper aims to provide researchers, engineers, and decision-makers with an objective and comprehensive perspective for reference. ...

March 27, 2026 · 39 min · 19048 words · siyou · 
AI

Reflections on the Viral Rise of OpenClaw

In the spring of 2026, an open-source project named “OpenClaw” unexpectedly went viral in China. Built upon local large language models and integrating multimodal interaction capabilities, it seeks to replace the traditional Graphical User Interface (GUI) with natural language, gestures, and even rudimentary brain-computer interfaces—redefining how humans converse with computers. A user simply says: “Turn this weekly report into a PowerPoint presentation, send it to Director Li, and include the data charts.” ...

March 10, 2026 · 7 min · 1347 words · siyou · 
OpenClaw