ChatGPT + Secure Robot Controller + Integrated Chassis Solution: A Design Concept

"Have you ever thought that you just need to tell your home assistant robot: 'Please heat my lunch,' and it will find the microwave by itself? Isn't that amazing?"
Recently, Microsoft published a paper titled "ChatGPT for Robotics: Design Principles and Model Abilities" on its official website, announcing their research findings on applying ChatGPT to robotics.
The paper states that the goal of this research is to observe whether ChatGPT can think beyond text and reason about the physical world to help complete robotic tasks. Humans currently still heavily rely on handwritten code to control robots. The team has been exploring how to change this reality and use OpenAI's new AI language model, ChatGPT, to achieve natural human-robot interaction.

Researchers hope ChatGPT can help people interact with robots more easily, without needing to learn complex programming languages or detailed information about robotic systems. A key challenge is teaching ChatGPT how to use physical laws, the context of the operating environment, and understand how a robot's physical actions change the state of the world, and use this to solve specified tasks.
Regarding Microsoft's research, Dr. Peter John Bentley, Honorary Professor and computer scientist at University College London (UCL), stated in an interview with a reporter from "National Business Daily" that it is a completely feasible path for humans to control robots with AI tools like ChatGPT in the future.
However, he also emphasized that for now, ChatGPT still has many vulnerabilities, lacking basic capabilities in functionality, reliability, and security.

Recently, Microsoft released new versions of its Bing internet search engine and Edge browser, which are powered by the latest technology from OpenAI, the maker of ChatGPT. The picture shows a staff member demonstrating Microsoft's AI-powered Bing search engine and Edge browser. Photo by Visual China Group.
How ChatGPT Controls Robots
ChatGPT is a language model trained on a vast corpus of text and human interactions, enabling it to generate coherent and grammatically correct responses to various prompts and questions.
The researchers stated in the article that current robot operation begins with engineers or technical users, who need to translate task requirements into system code. In the robot operation loop, engineers need to write new code and specifications to correct robot behavior. Overall, this process is slow, expensive, and inefficient, because it requires not only highly skilled users with deep robotics knowledge but also multiple user interactions to get the robot to function correctly.
ChatGPT, however, unlocks a new robotics paradigm and allows potential non-technical users in the loop to provide high-level feedback to the Large Language Model (LLM) while monitoring robot performance.
By following the designers' design principles, ChatGPT can generate code for robotic scenarios. Without any fine-tuning, people can leverage the LLM's knowledge to control different robot form factors for various tasks. Through trial and error, Microsoft's researchers built a method and design principles specifically for writing prompts for robotic tasks:

First, define a set of high-level robot APIs or function libraries. This library can be designed for specific robot types and should map from the robot's control stack or perception library to existing low-level concrete implementations. It is very important to use descriptive names for high-level APIs so that ChatGPT can reason about their behavior.
Next, write a text prompt for ChatGPT, describing the task objective while explicitly stating which functions from the high-level library are available. The prompt can also include information about task constraints, or how ChatGPT should organize its answer, including using a specific programming language, or using auxiliary parsing components, etc.
Third, users evaluate ChatGPT's code output through direct inspection or by using a simulator. If needed, users provide feedback to ChatGPT in natural language regarding the quality and safety of the answer.
Finally, when the user is satisfied with the solution, the final code can be deployed to the robot.
What ChatGPT + Robotics Will Bring
In the article, Microsoft's research team demonstrated multiple examples of ChatGPT solving robotic challenges in various applications, as well as complex robot deployments in drone manipulation and navigation.
Researchers gave ChatGPT access to control the full functionality of a real drone, and it proved that very intuitive and natural language-based communication can be used between non-technical users and robots.
When user instructions were ambiguous, ChatGPT would ask clarifying questions and write complex code structures for the drone, such as flying zig-zag patterns to visually inspect shelves; it could even take a selfie for the user.
Researchers asked ChatGPT to write an algorithm for the drone to reach a target in the air without hitting obstacles. Researchers told the model that the drone had a forward-facing distance sensor, and ChatGPT immediately coded most of the key building blocks for the algorithm.
Researchers stated that this task required some human conversation, but they were impressed by ChatGPT's ability to make localized code improvements using only natural language feedback.
Microsoft researchers also used ChatGPT in simulated industrial inspection scenarios and utilized the Microsoft AirSim simulator. The model was able to effectively parse users' high-level intentions and geometric cues to accurately control the drone.
When ChatGPT was used in robotic arm manipulation scenarios, researchers used conversational feedback to teach the model how to combine initially provided APIs into more complex high-level functions, i.e., functions internally coded by ChatGPT itself. Using a curriculum-based strategy, the model was able to logically link these learned skills together to perform operations such as stacking blocks.
Furthermore, the model also demonstrated an interesting example: bridging the text domain and the physical domain when building the Microsoft logo with wooden blocks. ChatGPT was not only able to recall the Microsoft logo from its internal knowledge base but also "draw" this logo using SVG code, and then use the skills learned above to determine the existing robot actions and physical forms that could compose it.
However, Dr. Bentley believes that at this stage, although ChatGPT can generate computer code, the problem with robot control is that computer code may need to be tailored for specific hardware to function correctly. ChatGPT currently uses code examples it has already learned, and it (ChatGPT) may not be compatible with the latest hardware.
Therefore, this is also an opportunity for robot controllers. Everyone is welcome to discuss.
Xinmai provides ChatGPT + Robot Controller + Integrated Chassis Solution.