At the core of every robot that "understands" its surroundings lies a stream of invisible geometry. At Stereovis, we turn stereo image data into real-time decisions through a seamless pipeline of point cloud generation, pose estimation, and embedded AI. But what does this process actually look like? And what makes it different from traditional machine vision?
This blog dives into the inner workings of how Stereovis devices process the world — all onboard — from 3D image capture to actionable robotic output.
1. What is a Point Cloud, Really?
A point cloud is a collection of 3D points that represent the external surfaces of objects in space. Each point typically contains X, Y, Z coordinates, and sometimes additional data like color, intensity, or surface normals.
Stereovis devices generate point clouds using structured light stereo vision — two sensors plus an active projector. By analyzing the pixel disparity between left and right cameras, we reconstruct the depth of every visible point.
Key Features:
- Real-time generation onboard
- Depth range from short to mid (up to 3m)
- Submillimeter precision with embedded calibration
- Available in 4D (XYZ + I), 6N (XYZ + normals), and 6C (XYZ + color)
2. From Point Cloud to Pose: Estimating Object Position
3D data alone is not enough — the robot needs to know where and how to pick or interact with the object. That’s where pose estimation comes in.
Using our SDK, you can:
- Define a region of interest (ROI)
- Extract clusters or segment known shapes
- Compute 6DOF pose (X, Y, Z + roll, pitch, yaw)
This process is fast, and when paired with neural network classification, it enables:
- Part recognition
- Pose tracking in cluttered bins
- On-the-fly corrections
And it all runs on Jetson Nano or Orin Nano — no external PC needed.
3. Why Embedded AI Makes All the Difference
Most 3D cameras output raw data — but Stereovis embeds intelligence directly on the device. Our Jetson-powered modules allow developers to deploy models using:
- PyTorch / TensorFlow
- ONNX or TensorRT
- Custom image or point cloud pipelines
This enables:
- Visual anomaly detection
- Surface classification
- Adaptive robot control (e.g. selecting based on shape, color, condition)
You can even do pre-filtering and region masking directly on-device before sending commands to the robot.
4. Real-Time Application: Bin Picking
Let’s take a real use case — a robot arm picking irregular parts from a bin.
With a Stereovis M03 mounted above the bin:
- The camera streams point cloud data
- The onboard Jetson calculates part locations and poses
- An ROI is applied based on shape
- The SDK sends the pose directly to the robot via TCP/IP
No external compute. No network delay. Just direct, intelligent motion.
5. The Developer View
Stereovis gives full access to this pipeline:
from stereovis_sdk import Camera cam = Camera(ip='192.168.1.251') cam.initialize() cam.trigger() points = cam.get_point_cloud('6N') pose = cam.estimate_pose_from_roi(x0=100, y0=100, w=120, h=120) cam.send_robot_pose(*pose)
You can build your own logic or plug in existing tools like OpenCV, ROS, or MoveIt.
Conclusion: Smarter Eyes, Simpler Integration
By combining real-time point cloud generation, fast 6DOF pose estimation, and deployable AI — all on the device — Stereovis enables a new era of robotic vision.
You don’t need to be a computer vision expert to implement advanced 3D automation. You just need a camera that understands the world the way your robot does.