Based on RK3588/Sophon BM1684 AI Box: Integrated Video Intelligent AI Analysis System Construction Plan (Part 1)
- Construction Background
To actively respond to the nation's five major development concepts of "innovation, coordination, green, openness, and sharing," and to implement the relevant policy requirements of "Made in China 2025" and the "Guiding Opinions of the State Council on Actively Promoting the 'Internet+' Action," combined with the factory's current situation, production status, existing problems, and future development planning needs, it is proposed to comprehensively optimize, upgrade, and transform its process flows, green environmental protection, and intelligent manufacturing aspects. The goal is to build the enterprise into a domestic first-class, green, and intelligent production base, achieving a dual improvement in quality and efficiency.
"Made in China 2025" is China's version of the "Industry 4.0" plan. The plan, approved by Premier Li Keqiang, was officially released by the State Council on May 8, 2015. Miao Wei, Minister of Industry and Information Technology, pointed out in a collective interview with central media: "'Made in China 2025' is the action program for the first decade of the 'three-step strategy'. China aims to enter the second tier of global manufacturing through these ten years of efforts, laying a solid foundation for the subsequent two steps." "Made in China 2025" clearly states that manufacturing is the backbone of the national economy, the foundation of a nation, the instrument for its prosperity, and the bedrock of its strength.
To reduce redundant steps in workflows, minimize equipment failures, standardize technical worker operating procedures, and enhance enterprises' emergency command capabilities for handling production emergencies, thereby minimizing staffing requirements, improving employee work efficiency, and enhancing product quality, numerous manufacturing enterprises have successively adopted a series of informatization measures, with the construction of "smart factories," "smart production," and "intelligent manufacturing" initiatives unfolding. Leveraging video surveillance systems to improve trial production management levels is an important component of smart production construction and one of the key means to achieve visualized production control in smart factories.
- Design Principles
The design of this system should adhere to the fundamental principles of "advancement, reliability, openness, practicality, security, and economy."
- Advancement: Equipment selection and technology application should possess a certain degree of foresight while considering practicality. Based on big data and artificial intelligence technologies as the overall architectural foundation, build an upgradable, scalable, and compatible application platform to construct an intelligent, information-driven, and visualized system.
- Reliability: Considering the system's 24/7 real-time requirements, the system must have the capability for continuous, uninterrupted operation and effectively ensure high availability and reliability through technologies such as automatic detection, automatic alarming, and automatic monitoring. The system software should adopt a modular, layered isolation design, support clustering and load balancing technologies, and feature dual-machine hot standby functionality to fully ensure system stability.
- Openness: The system's construction should comply with national and industry-related standards, requiring the adoption of mainstream hardware, software, operating systems, databases, and standard protocols. The system's development should be based on current conditions, with a long-term perspective, to meet the continuous development of new technologies and the needs of present and future work. Considering the rapid evolution of information technology, informatization construction should have appropriate foresight.
- Practicality: System construction must emphasize application, select highly practical equipment, and adopt a modular design approach. While meeting current demands, it should fully consider future system upgrade and expansion needs. The system's user interface should feature humanized design, strong human-computer interaction, and simple overall installation, operation, use, and maintenance.
- Security: The system should possess security features against damage and intrusion. The software should be immune to virus infections and hacker attacks, and the overall system should have high security and confidentiality. Concurrently, the system should feature anti-misoperation characteristics, strong anti-interference and anti-static capabilities, provide data backup and recovery measures, and offer user-level permission protection to effectively eliminate interference from human factors.
- Economy: While ensuring advancement, reliability, openness, practicality, and security, attention should be paid to the cost and phased investment of system construction, making full use of existing basic resources, and reasonably controlling the overall project investment.
- Construction Goals
By enhancing rapid response capabilities through integrated security and expanded applications, optimizing traditional security management methods to reduce intermediate links and middle management personnel, thereby establishing a sophisticated, agile, and innovative "flat" organizational structure, information feedback will be faster, improving the ability to quickly respond to security risks and on-site issues.
- Introduction to the Integrated Video Intelligent AI Analysis System
Based on computer vision technology, AI empowers various industries. Relying on artificial intelligence visual analysis technology and powerful "edge + cloud" computing power support, it analyzes events such as smoke, fire, and intrusion in real-time. Combined with a cloud-based early warning business platform, it achieves a full-process closed loop for event detection, early warning, and handling.
-
- Design Architecture
- System Architecture
- Design Architecture
The video intelligent recognition system is divided into four layers from bottom to top: "Perception Layer, Network Layer, Support Layer, and Application Layer". The system's logical architecture is shown in the figure below:

Perception Layer
Connects to front-end sensing devices, such as video surveillance, NVRs, and other IoT sensing equipment, to perform real-time monitoring and analysis of important channels and locations, providing a data foundation for scene analysis. The collected data types include image streams and video streams.
Network Layer
The network layer connects to the factory's main local area network (LAN), dedicated video network, etc.
Support Layer
The support layer provides the main capability support for the application layer, including video surveillance, intelligent algorithm repository, other IoT sensing data, and the establishment of alarm models, providing capability support for future more intelligent business applications and management needs.
Application Layer
The application layer primarily encompasses integrated video intelligent early warning business applications, including but not limited to video surveillance, device management, event center, and various safety and fire hazard applications.
-
-
- Deployment Architecture
-
Data collection is performed using new and existing cameras. Video streams are accessed via edge and cloud platforms for AI intelligent analysis of events, and the system supports transmitting results to third-party platforms for data display. The system's networking architecture diagram is shown below:

System Components:
- Reused Surveillance Cameras
Reused surveillance cameras refer to fully utilizing existing video surveillance and video storage infrastructure as data collection terminals for video AI analysis.
- New Smart Cameras
New smart cameras refer to directly deploying surveillance cameras with AI analysis capabilities in newly added or supplemented surveillance blind spots to actively identify and detect production safety incidents.
- AI Intelligent Analysis Server
The AI intelligent analysis server simultaneously possesses two core capabilities: it detects, analyzes, and recognizes various behaviors from real-time video streams of reused surveillance cameras, and it also receives and stores alarm events identified by new smart cameras. The intelligent analysis AI all-in-one machine highly integrates video preview, AI algorithm models, and the integrated video intelligent AI analysis business system.
The integrated video intelligent AI analysis business system primarily analyzes video data using intelligent analysis algorithms via common protocols such as RTSP. It then provides comprehensive display of analysis results and other business functionalities, while also supporting the integration of alarm data with third-party business systems through interfaces.
-
-
- Video Resource Access
-
Supports integrated access of video surveillance resources into the video intelligent recognition system through device/platform docking. Access methods include, but are not limited to, direct connection of network cameras supporting RTSP protocol, and access via video management platforms based on GB/T-28181 protocol.
-
- Main Algorithm Descriptions
| | Algorithm Name | Description | |---|---|---| | 1 | Smoke and Fire Detection | Automatically identifies smoke and flames within the operational area; if detected, an alarm is immediately triggered. | | 2 | Smoking Detection | Automatically identifies personnel entering the operational area; if smoking behavior is detected, an alarm is immediately triggered. | | 3 | Phone Call Detection | Automatically identifies personnel entering the operational area; if phone call behavior is detected, an alarm is immediately triggered. | | 4 | Mobile Phone Usage Detection | Automatically identifies personnel entering the operational area; if mobile phone usage is detected, an alarm is immediately triggered. | | 5 | Hard Hat and Reflective Vest Detection | Automatically identifies personnel in the operational area; if hard hats and reflective vests are not worn, an alarm is immediately triggered. | | 6 | Workwear (Protective Clothing) Detection | Automatically identifies personnel entering the operational area; if workwear (protective clothing) is not worn, an alarm is immediately triggered. | | 7 | Restricted Area Intrusion Detection (Area Intrusion) | Automatically identifies personnel entering the operational area; if personnel are detected, an alarm is immediately triggered. | | 8 | Perimeter Crossing (Fence Hopping) Detection | Automatically identifies personnel entering the fenced area; if personnel cross a wall within the area or climb over a gate when it is closed, an alarm is immediately triggered. | | 9 | Personnel Absenteeism Detection | Automatically identifies personnel in the operational area; if no personnel are detected and the absence exceeds the allowed time, an alarm is immediately triggered. | | 10 | Personnel Sleeping on Duty Detection | Automatically identifies personnel in the detection area; if personnel are detected sleeping at a desk, an alarm is immediately triggered. | | 11 | Personnel Fall Detection | Automatically identifies personnel in the detection area; if a fall is detected, an alarm is immediately triggered. | | 12 | High-Altitude Object Dropping Detection | An alarm is immediately triggered when high-altitude object dropping behavior occurs in the detection area. | | 13 | Fire Lane Occupancy Detection | Automatically identifies vehicles in the detection area; if a vehicle is detected occupying the lane, an alarm is immediately triggered. | | 14 | Electric Bicycle in Elevator Detection | Automatically identifies electric bicycles in the detection area; if an electric bicycle is detected inside an elevator, an alarm is immediately triggered. | | 15 | Illegal Parking Detection | Automatically identifies vehicles in the detection area; if illegal parking is detected, an alarm is immediately triggered. | | 16 | Crowd Gathering Detection | Automatically identifies personnel in the detection area; if a crowd gathering is detected, an alarm is immediately triggered. | | 17 | Pedestrian Structuring Detection | Face and facial structured data recognition | | 18 | Motor Vehicle Structuring Detection | Motor vehicle and motor vehicle structured data recognition | | 19