{"id":20,"date":"2025-07-13T15:53:02","date_gmt":"2025-07-13T07:53:02","guid":{"rendered":"https:\/\/cs.huarenca.com\/?p=20"},"modified":"2025-07-13T15:53:03","modified_gmt":"2025-07-13T07:53:03","slug":"%e6%89%be%e5%88%b0%e4%b8%80%e4%b8%aagithub%e4%b8%8a%e9%9d%9e%e5%b8%b8%e4%bc%98%e7%a7%80%e7%9a%84ai%e9%a1%b9%e7%9b%aecradle%ef%bc%8c%e5%8f%af%e6%8e%a7%e5%88%b6%e9%bc%a0%e6%a0%87%e3%80%81%e9%94%ae","status":"publish","type":"post","link":"https:\/\/cs.huarenca.com\/en\/20.html","title":{"rendered":"Found a Github very good AI project Cradle, can control the mouse, keyboard, simulate the human operation, too silky smooth, collection ~ ~ ~ ~"},"content":{"rendered":"<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><a href=\"https:\/\/www.cradle.bio\/\">Cradle<\/a> It is an open-source program by the BAAI-Agents team for\u00a0<strong>General Computer Control (GCC)<\/strong>\u00a0s multimodal AI Agent framework, which allows large multimodal models to use a variety of software and games like a human via screenshot input and keystroke output.<\/p>\n<\/blockquote>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Common goal: support for any native software (e.g. games, Office, image\/video editing tools)<\/li>\n\n\n\n<li>Multi-modal input: screenshot as input, support keyboard and mouse operation as output<\/li>\n\n\n\n<li>Autonomy: Built-in \"Cognitive Reflection + Skills Update\" module for continuous self-optimization.<\/li>\n\n\n\n<li>Modular design: high controllability and scalability, easy to adapt to new environments<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>pain point scenario<\/strong><\/h2>\n\n\n\n<p>Since the birth of the GPT series of gurus, LLMs have seen explosive growth. However, they rely on \"API text input\/output\", which makes them unable to control the local interface, and local task automation is still difficult:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Operation of Office, visualization software is limited<\/li>\n\n\n\n<li>Splitting complex tasks makes it difficult to close the loop<\/li>\n\n\n\n<li>Lack of visual ability to locate UI elements based on language alone<\/li>\n\n\n\n<li>Inability to memorize history for long periods of time and insufficient execution of multi-step logic<\/li>\n<\/ul>\n\n\n\n<p>Cradle is designed to address these pain points:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Controls mouse and keyboard to simulate human operation<\/li>\n\n\n\n<li>Strengthening \"self-reflection\" and \"skill optimization\" strategies<\/li>\n\n\n\n<li>Supports long-range tasks, complex gaming environments, and specialized software operations<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>core functionality<\/strong><\/h2>\n\n\n\n<p>Below is a list of Cradle's 6 core module features:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Information Gathering<\/strong>\n<ul class=\"wp-block-list\">\n<li>Processing UI screenshots, text messages using visual models<\/li>\n\n\n\n<li>Audio feedback can be accessed to complete the interoceptive input<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Self-Reflection<\/strong>\n<ul class=\"wp-block-list\">\n<li>Review historical operational results to determine if they were achieved<\/li>\n\n\n\n<li>Summarize the reasons for failure and provide guidance for the next run<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Task Inference<\/strong>\n<ul class=\"wp-block-list\">\n<li>Inferring current goals based on environment + historical memory<\/li>\n\n\n\n<li>Dynamic Programming Next Optimal Policy<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Skill Curation<\/strong>\n<ul class=\"wp-block-list\">\n<li>Generate or update skill functions for each task<\/li>\n\n\n\n<li>Customized strategies by environment for experience<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Action Planning<\/strong>\n<ul class=\"wp-block-list\">\n<li>LLM outputs high-level actions (e.g., \"click on X\" \"move mouse to Y\")<\/li>\n\n\n\n<li>Translation of human-written bridging layers to keystrokes and mouse actions<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Memory module (Memory)<\/strong>\n<ul class=\"wp-block-list\">\n<li>Short-term and long-term memory, including historical records<\/li>\n\n\n\n<li>Supports reuse of memories and skills across tasks<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n\n\n\n<p>These modules form a set of closed loops: input screenshot \u2192 what you see \u2192 introspection \u2192 planning \u2192 execution \u2192 memory feedback.<\/p>\n\n\n\n<p>Experiments have proven that Cradle can be accomplished:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AAA Games:<strong>Red Dead Redemption 2<\/strong>\u00a0Main quests, high success rate completion;<\/li>\n\n\n\n<li>Municipal Games:<strong>Cities: Skylines<\/strong>\u00a0Creating a City of a Thousand;<\/li>\n\n\n\n<li>Farm Games:<strong>Stardew Valley<\/strong>\u00a0Automatic seeding and harvesting;<\/li>\n\n\n\n<li>Business Game:<strong>Dealer's Life 2<\/strong>\u00a0Achieve the highest weekly profit of 87%;<\/li>\n\n\n\n<li>Office software: Sign in to Chrome, reply to Outlook, use Feishu;<\/li>\n\n\n\n<li>Editing tools: Meituxiu, CapCut image\/video processing.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>technical architecture<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"834\" height=\"179\" src=\"https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/\u5fae\u4fe1\u622a\u56fe_20250713155208.jpg\" alt=\"\" class=\"wp-image-22\" srcset=\"https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/\u5fae\u4fe1\u622a\u56fe_20250713155208.jpg 834w, https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/\u5fae\u4fe1\u622a\u56fe_20250713155208-300x64.jpg 300w, https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/\u5fae\u4fe1\u622a\u56fe_20250713155208-768x165.jpg 768w, https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/\u5fae\u4fe1\u622a\u56fe_20250713155208-18x4.jpg 18w\" sizes=\"auto, (max-width: 834px) 100vw, 834px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong><\/strong><strong>List of Technical Advantages<\/strong><\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Technical Advantages<\/th><th>descriptive<\/th><\/tr><\/thead><tbody><tr><td><strong>No API Insight at All<\/strong><\/td><td>Does not rely on internal UI interfaces and adapts to a wide range of software.<\/td><\/tr><tr><td><strong>Highly modular configuration<\/strong><\/td><td>Easily scalable to new games or software environments<\/td><\/tr><tr><td><strong>Progressive capacity enhancement<\/strong><\/td><td>LLM + self-reflection + memory techniques to support self-improvement<\/td><\/tr><tr><td><strong>Universal Operating Interface<\/strong><\/td><td>Screenshots + Keyboard and Mouse Output, Truly Universal<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>An illustration of the interface<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"855\" height=\"394\" src=\"https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/7772.jpg\" alt=\"\" class=\"wp-image-21\" srcset=\"https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/7772.jpg 855w, https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/7772-300x138.jpg 300w, https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/7772-768x354.jpg 768w, https:\/\/cs.huarenca.com\/wp-content\/uploads\/2025\/07\/7772-18x8.jpg 18w\" sizes=\"auto, (max-width: 855px) 100vw, 855px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>application scenario<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>R&amp;D AI Agent can autonomously simulate user actions, replacing UI API testing https:\/\/wxa.wxs.qq.com\/tmpl\/mi\/base_tmpl.html<\/li>\n\n\n\n<li>Office automation: a large number of repetitive tasks (emails, forms, reports) can be completely automated.<\/li>\n\n\n\n<li>Game AI development: Become an in-game intelligence, test missions\/train NPCs<\/li>\n\n\n\n<li>Process Automation: Provides UI automation pipeline with less reliance on traditional RPA<\/li>\n\n\n\n<li>Education and Training: Cradle demonstrates how to do things and helps students understand complex software.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Who's stronger?<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Framework project<\/th><th>Support Mode<\/th><th>Whether or not it relies on an API<\/th><th>Key requirements<\/th><th>Core Advantages<\/th><\/tr><\/thead><tbody><tr><td><strong>Cradle<\/strong><\/td><td>Screenshots + Keyboarding<\/td><td>\u274c No API<\/td><td>Complete closed-loop, self-directed learning<\/td><td>Versatility, Modularity, Wide Adaptation<\/td><\/tr><tr><td>LangChain Agent<\/td><td>Text API Input\/Output<\/td><td>\u2705 With API<\/td><td>Text commands \/ HTTP requests<\/td><td>Expertise in information retrieval, text management<\/td><\/tr><tr><td>AutoHotkey \/ RPA etc.<\/td><td>keyboard and mouse macro (computing)<\/td><td>\u274c No API<\/td><td>Single-step macro operation, lack of memory planning<\/td><td>Easy to use but low intelligence, weak self-improvement<\/td><\/tr><tr><td>Playwright\/Selenium<\/td><td>DOM Manipulation API<\/td><td>\u2705 DOM API<\/td><td>web automation<\/td><td>Specializes in web, more limited than desktop<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>Strengths: Cradle is a multimodal, cognitively-enabled \"universal software executable\" that goes beyond traditional or web automation tools.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Article Summary<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Cradle is the first general-purpose software-controlled AI agent.<\/strong>Supports a wide range of local software and AAA game operations.<\/li>\n\n\n\n<li>The core is 6 modules with self-thinking, self-learning, and self-adaptive capabilities.<\/li>\n\n\n\n<li>Modularized and maintainable technical architecture<\/li>\n\n\n\n<li>Compared to traditional tools, Cradle offers a video-quality experience, global closed-loop intelligence, and the ability to create a new, more efficient, and more effective way of communicating with your customers.<\/li>\n\n\n\n<li>Suitable for R&amp;D automation, office, game development and teaching scenarios.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Project Address<\/strong><\/h4>\n\n\n\n<p><a href=\"https:\/\/github.com\/baai-agents\/cradle\">https:\/\/github.com\/baai-agents\/cradle<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Cradle \u662f\u7531 BAAI\u2011Agents \u56e2\u961f\u5f00\u6e90\u7684\u4e00\u6b3e\u9762\u5411\u00a0\u901a\u7528\u8ba1\u7b97\u673a\u63a7\u5236\uff08GCC\uff09\u00a0\u7684\u591a\u6a21\u6001 AI Ag [&hellip;]<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"class_list":["post-20","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/posts\/20","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/comments?post=20"}],"version-history":[{"count":0,"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/posts\/20\/revisions"}],"wp:attachment":[{"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/media?parent=20"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/categories?post=20"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cs.huarenca.com\/en\/wp-json\/wp\/v2\/tags?post=20"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}