Paper-Daily-Notice
Paper-Daily-Notice copied to clipboard
New submissions for Tue, 26 Apr 22
Keyword: SLAM
Indoor simultaneous localization and mapping based on fringe projection profilometry
- Authors: Yang Zhao, Kai Zhang, Haotian Yu, Yi Zhang, Dongliang Zheng, Jing Han
- Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11020
- Pdf link: https://arxiv.org/pdf/2204.11020
- Abstract Simultaneous Localization and Mapping (SLAM) plays an important role in outdoor and indoor applications ranging from autonomous driving to indoor robotics. Outdoor SLAM has been widely used with the assistance of LiDAR or GPS. For indoor applications, the LiDAR technique does not satisfy the accuracy requirement and the GPS signals will be lost. An accurate and efficient scene sensing technique is required for indoor SLAM. As the most promising 3D sensing technique, the opportunities for indoor SLAM with fringe projection profilometry (FPP) systems are obvious, but methods to date have not fully leveraged the accuracy and speed of sensing that such systems offer. In this paper, we propose a novel FPP-based indoor SLAM method based on the coordinate transformation relationship of FPP, where the 2D-to-3D descriptor-assisted is used for mapping and localization. The correspondences generated by matching descriptors are used for fast and accurate mapping, and the transform estimation between the 2D and 3D descriptors is used to localize the sensor. The provided experimental results demonstrate that the proposed indoor SLAM can achieve the localization and mapping accuracy around one millimeter.
MLO: Multi-Object Tracking and Lidar Odometry in Dynamic Envirnoment
- Authors: Tingchen Ma, Yongsheng Ou
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2204.11621
- Pdf link: https://arxiv.org/pdf/2204.11621
- Abstract The SLAM system built on the static scene assumption will introduce significant estimation errors when a large number of moving objects appear in the field of view. Tracking and maintaining semantic objects is beneficial to understand the scene and provide rich decision information for planning and control modules. This paper introduces MLO, a multi-object Lidar odometry which tracks ego-motion and movable objects with only the lidar sensor. First, it achieves information extraction of foreground movable objects, surface road, and static background features based on geometry and object fusion perception module. While robustly estimating ego-motion, it accomplishes multi-object tracking through the least-squares method fused by 3D bounding boxes and geometric point clouds. Then, a continuous 4D semantic object map on the timeline can be created. Our approach is evaluated qualitatively and quantitatively under different scenarios on the public KITTI dataset. The experiment results show that the ego localization accuracy of MLO is better than A-LOAM system in highly dynamic, unstructured, and unknown semantic scenes. Meanwhile, the multi-object tracking method with semantic-geometry fusion also has apparent advantages in accuracy and tracking robustness compared with the single method.
Keyword: Visual inertial
There is no result
Keyword: livox
There is no result
Keyword: loam
MLO: Multi-Object Tracking and Lidar Odometry in Dynamic Envirnoment
- Authors: Tingchen Ma, Yongsheng Ou
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2204.11621
- Pdf link: https://arxiv.org/pdf/2204.11621
- Abstract The SLAM system built on the static scene assumption will introduce significant estimation errors when a large number of moving objects appear in the field of view. Tracking and maintaining semantic objects is beneficial to understand the scene and provide rich decision information for planning and control modules. This paper introduces MLO, a multi-object Lidar odometry which tracks ego-motion and movable objects with only the lidar sensor. First, it achieves information extraction of foreground movable objects, surface road, and static background features based on geometry and object fusion perception module. While robustly estimating ego-motion, it accomplishes multi-object tracking through the least-squares method fused by 3D bounding boxes and geometric point clouds. Then, a continuous 4D semantic object map on the timeline can be created. Our approach is evaluated qualitatively and quantitatively under different scenarios on the public KITTI dataset. The experiment results show that the ego localization accuracy of MLO is better than A-LOAM system in highly dynamic, unstructured, and unknown semantic scenes. Meanwhile, the multi-object tracking method with semantic-geometry fusion also has apparent advantages in accuracy and tracking robustness compared with the single method.
Keyword: Visual inertial odometry
There is no result
Keyword: lidar
Indoor simultaneous localization and mapping based on fringe projection profilometry
- Authors: Yang Zhao, Kai Zhang, Haotian Yu, Yi Zhang, Dongliang Zheng, Jing Han
- Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11020
- Pdf link: https://arxiv.org/pdf/2204.11020
- Abstract Simultaneous Localization and Mapping (SLAM) plays an important role in outdoor and indoor applications ranging from autonomous driving to indoor robotics. Outdoor SLAM has been widely used with the assistance of LiDAR or GPS. For indoor applications, the LiDAR technique does not satisfy the accuracy requirement and the GPS signals will be lost. An accurate and efficient scene sensing technique is required for indoor SLAM. As the most promising 3D sensing technique, the opportunities for indoor SLAM with fringe projection profilometry (FPP) systems are obvious, but methods to date have not fully leveraged the accuracy and speed of sensing that such systems offer. In this paper, we propose a novel FPP-based indoor SLAM method based on the coordinate transformation relationship of FPP, where the 2D-to-3D descriptor-assisted is used for mapping and localization. The correspondences generated by matching descriptors are used for fast and accurate mapping, and the transform estimation between the 2D and 3D descriptors is used to localize the sensor. The provided experimental results demonstrate that the proposed indoor SLAM can achieve the localization and mapping accuracy around one millimeter.
2D LiDAR and Camera Fusion Using Motion Cues for Indoor Layout Estimation
- Authors: Jieyu Li, Robert Stevenson
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11202
- Pdf link: https://arxiv.org/pdf/2204.11202
- Abstract This paper presents a novel indoor layout estimation system based on the fusion of 2D LiDAR and intensity camera data. A ground robot explores an indoor space with a single floor and vertical walls, and collects a sequence of intensity images and 2D LiDAR datasets. The LiDAR provides accurate depth information, while the camera captures high-resolution data for semantic interpretation. The alignment of sensor outputs and image segmentation are computed jointly by aligning LiDAR points, as samples of the room contour, to ground-wall boundaries in the images. The alignment problem is decoupled into a top-down view projection and a 2D similarity transformation estimation, which can be solved according to the vertical vanishing point and motion of two sensors. The recursive random sample consensus algorithm is implemented to generate, evaluate and optimize multiple hypotheses with the sequential measurements. The system allows jointly analyzing the geometric interpretation from different sensors without offline calibration. The ambiguity in images for ground-wall boundary extraction is removed with the assistance of LiDAR observations, which improves the accuracy of semantic segmentation. The localization and mapping is refined using the fused data, which enables the system to work reliably in scenes with low texture or low geometric features.
Multi-Layer Modeling of Dense Vegetation from Aerial LiDAR Scans
- Authors: Ekaterina Kalinicheva, Loic Landrieu, Clément Mallet, Nesrine Chehata
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11620
- Pdf link: https://arxiv.org/pdf/2204.11620
- Abstract The analysis of the multi-layer structure of wild forests is an important challenge of automated large-scale forestry. While modern aerial LiDARs offer geometric information across all vegetation layers, most datasets and methods focus only on the segmentation and reconstruction of the top of canopy. We release WildForest3D, which consists of 29 study plots and over 2000 individual trees across 47 000m2 with dense 3D annotation, along with occupancy and height maps for 3 vegetation layers: ground vegetation, understory, and overstory. We propose a 3D deep network architecture predicting for the first time both 3D point-wise labels and high-resolution layer occupancy rasters simultaneously. This allows us to produce a precise estimation of the thickness of each vegetation layer as well as the corresponding watertight meshes, therefore meeting most forestry purposes. Both the dataset and the model are released in open access: https://github.com/ekalinicheva/multi_layer_vegetation.
MLO: Multi-Object Tracking and Lidar Odometry in Dynamic Envirnoment
- Authors: Tingchen Ma, Yongsheng Ou
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2204.11621
- Pdf link: https://arxiv.org/pdf/2204.11621
- Abstract The SLAM system built on the static scene assumption will introduce significant estimation errors when a large number of moving objects appear in the field of view. Tracking and maintaining semantic objects is beneficial to understand the scene and provide rich decision information for planning and control modules. This paper introduces MLO, a multi-object Lidar odometry which tracks ego-motion and movable objects with only the lidar sensor. First, it achieves information extraction of foreground movable objects, surface road, and static background features based on geometry and object fusion perception module. While robustly estimating ego-motion, it accomplishes multi-object tracking through the least-squares method fused by 3D bounding boxes and geometric point clouds. Then, a continuous 4D semantic object map on the timeline can be created. Our approach is evaluated qualitatively and quantitatively under different scenarios on the public KITTI dataset. The experiment results show that the ego localization accuracy of MLO is better than A-LOAM system in highly dynamic, unstructured, and unknown semantic scenes. Meanwhile, the multi-object tracking method with semantic-geometry fusion also has apparent advantages in accuracy and tracking robustness compared with the single method.
Keyword: loop detection
There is no result
Keyword: autonomous driving
Indoor simultaneous localization and mapping based on fringe projection profilometry
- Authors: Yang Zhao, Kai Zhang, Haotian Yu, Yi Zhang, Dongliang Zheng, Jing Han
- Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11020
- Pdf link: https://arxiv.org/pdf/2204.11020
- Abstract Simultaneous Localization and Mapping (SLAM) plays an important role in outdoor and indoor applications ranging from autonomous driving to indoor robotics. Outdoor SLAM has been widely used with the assistance of LiDAR or GPS. For indoor applications, the LiDAR technique does not satisfy the accuracy requirement and the GPS signals will be lost. An accurate and efficient scene sensing technique is required for indoor SLAM. As the most promising 3D sensing technique, the opportunities for indoor SLAM with fringe projection profilometry (FPP) systems are obvious, but methods to date have not fully leveraged the accuracy and speed of sensing that such systems offer. In this paper, we propose a novel FPP-based indoor SLAM method based on the coordinate transformation relationship of FPP, where the 2D-to-3D descriptor-assisted is used for mapping and localization. The correspondences generated by matching descriptors are used for fast and accurate mapping, and the transform estimation between the 2D and 3D descriptors is used to localize the sensor. The provided experimental results demonstrate that the proposed indoor SLAM can achieve the localization and mapping accuracy around one millimeter.
RealNet: Combining Optimized Object Detection with Information Fusion Depth Estimation Co-Design Method on IoT
- Authors: Zhuohao Li, Fandi Gou, Qixin De, Leqi Ding, Yuanhang Zhang, Yunze Cai
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2204.11216
- Pdf link: https://arxiv.org/pdf/2204.11216
- Abstract Depth Estimation and Object Detection Recognition play an important role in autonomous driving technology under the guidance of deep learning artificial intelligence. We propose a hybrid structure called RealNet: a co-design method combining the model-streamlined recognition algorithm, the depth estimation algorithm with information fusion, and deploying them on the Jetson-Nano for unmanned vehicles with monocular vision sensors. We use ROS for experiment. The method proposed in this paper is suitable for mobile platforms with high real-time request. Innovation of our method is using information fusion to compensate the problem of insufficient frame rate of output image, and improve the robustness of target detection and depth estimation under monocular vision.Object Detection is based on YOLO-v5. We have simplified the network structure of its DarkNet53 and realized a prediction speed up to 0.01s. Depth Estimation is based on the VNL Depth Estimation, which considers multiple geometric constraints in 3D global space. It calculates the loss function by calculating the deviation of the virtual normal vector VN and the label, which can obtain deeper depth information. We use PnP fusion algorithm to solve the problem of insufficient frame rate of depth map output. It solves the motion estimation depth from three-dimensional target to two-dimensional point based on corner feature matching, which is faster than VNL calculation. We interpolate VNL output and PnP output to achieve information fusion. Experiments show that this can effectively eliminate the jitter of depth information and improve robustness. At the control end, this method combines the results of target detection and depth estimation to calculate the target position, and uses a pure tracking control algorithm to track it.
Six Levels of Autonomous Process Execution Management (APEM)
- Authors: Wil van der Aalst
- Subjects: Computers and Society (cs.CY); General Literature (cs.GL)
- Arxiv link: https://arxiv.org/abs/2204.11328
- Pdf link: https://arxiv.org/pdf/2204.11328
- Abstract Terms such as the Digital Twin of an Organization (DTO) and Hyperautomation (HA) illustrate the desire to autonomously manage and orchestrate processes, just like we aim for autonomously driving cars. Autonomous driving and Autonomous Process Execution Management (APEM) have in common that the goals are pretty straightforward and that each year progress is made, but fully autonomous driving and fully autonomous process execution are more a dream than a reality. For cars, the Society of Automotive Engineers (SAE) identified six levels (0-5), ranging from no driving automation (SAE, Level 0) to full driving automation (SAE, Level 5). This short article defines six levels of Autonomous Process Execution Management (APEM). The goal is to show that the transition from one level to the next will be gradual, just like for self-driving cars.
Road Traffic Law Adaptive Decision-making for Self-Driving Vehicles
- Authors: Jiaxin Liu, Wenhui Zhou, Hong Wang, Zhong Cao, Wenhao Yu, Chengxiang Zhao, Ding Zhao, Diange Yang, Jun Li
- Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2204.11411
- Pdf link: https://arxiv.org/pdf/2204.11411
- Abstract Self-driving vehicles have their own intelligence to drive on open roads. However, vehicle managers, e.g., government or industrial companies, still need a way to tell these self-driving vehicles what behaviors are encouraged or forbidden. Unlike human drivers, current self-driving vehicles cannot understand the traffic laws, thus rely on the programmers manually writing the corresponding principles into the driving systems. It would be less efficient and hard to adapt some temporary traffic laws, especially when the vehicles use data-driven decision-making algorithms. Besides, current self-driving vehicle systems rarely take traffic law modification into consideration. This work aims to design a road traffic law adaptive decision-making method. The decision-making algorithm is designed based on reinforcement learning, in which the traffic rules are usually implicitly coded in deep neural networks. The main idea is to supply the adaptability to traffic laws of self-driving vehicles by a law-adaptive backup policy. In this work, the natural language-based traffic laws are first translated into a logical expression by the Linear Temporal Logic method. Then, the system will try to monitor in advance whether the self-driving vehicle may break the traffic laws by designing a long-term RL action space. Finally, a sample-based planning method will re-plan the trajectory when the vehicle may break the traffic rules. The method is validated in a Beijing Winter Olympic Lane scenario and an overtaking case, built in CARLA simulator. The results show that by adopting this method, the self-driving vehicles can comply with new issued or updated traffic laws effectively. This method helps self-driving vehicles governed by digital traffic laws, which is necessary for the wide adoption of autonomous driving.
Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training
- Authors: Zhenyu Li, Zehui Chen, Ang Li, Liangji Fang, Qinhong Jiang, Xianming Liu, Junjun Jiang
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11590
- Pdf link: https://arxiv.org/pdf/2204.11590
- Abstract Monocular 3D object detection (Mono3D) has achieved unprecedented success with the advent of deep learning techniques and emerging large-scale autonomous driving datasets. However, drastic performance degradation remains an unwell-studied challenge for practical cross-domain deployment as the lack of labels on the target domain. In this paper, we first comprehensively investigate the significant underlying factor of the domain gap in Mono3D, where the critical observation is a depth-shift issue caused by the geometric misalignment of domains. Then, we propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D. To mitigate the depth-shift, we introduce the geometry-aligned multi-scale training strategy to disentangle the camera parameters and guarantee the geometry consistency of domains. Based on this, we develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain. Benefiting from the end-to-end framework that provides richer information of the pseudo labels, we propose the quality-aware supervision strategy to take instance-level pseudo confidences into account and improve the effectiveness of the target-domain training process. Moreover, the positive focusing training strategy and dynamic threshold are proposed to handle tremendous FN and FP pseudo samples. STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset. To the best of our knowledge, this is the first study to explore effective UDA methods for Mono3D.
Keyword: mapping
Error-in-variables modelling for operator learning
- Authors: Ravi G. Patel, Indu Manickam, Myoungkyu Lee, Mamikon Gulian
- Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
- Arxiv link: https://arxiv.org/abs/2204.10909
- Pdf link: https://arxiv.org/pdf/2204.10909
- Abstract Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the mapping between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typically exhibit noise in the independent variables as well, since both variables represent signals that are subject to measurement error. In regression on scalar data, failure to account for noisy independent variables can lead to biased parameter estimates. With noisy independent variables, linear models fitted via ordinary least squares (OLS) will show attenuation bias, wherein the slope will be underestimated. In this work, we derive an analogue of attenuation bias for linear operator regression with white noise in both the independent and dependent variables. In the nonlinear setting, we computationally demonstrate underprediction of the action of the Burgers operator in the presence of noise in the independent variable. We propose error-in-variables (EiV) models for two operator regression methods, MOR-Physics and DeepONet, and demonstrate that these new models reduce bias in the presence of noisy independent variables for a variety of operator learning problems. Considering the Burgers operator in 1D and 2D, we demonstrate that EiV operator learning robustly recovers operators in high-noise regimes that defeat OLS operator learning. We also introduce an EiV model for time-evolving PDE discovery and show that OLS and EiV perform similarly in learning the Kuramoto-Sivashinsky evolution operator from corrupted data, suggesting that the effect of bias in OLS operator learning depends on the regularity of the target operator.
STC-IDS: Spatial-Temporal Correlation Feature Analyzing based Intrusion Detection System for Intelligent Connected Vehicles
- Authors: Mu Han, Pengzhou Cheng, Fengwei Zhang
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2204.10990
- Pdf link: https://arxiv.org/pdf/2204.10990
- Abstract Intrusion detection is an important defensive measure for the security of automotive communications. Accurate frame detection models assist vehicles to avoid malicious attacks. Uncertainty and diversity regarding attack methods make this task challenging. However, the existing works have the limitation of only considering local features or the weak feature mapping of multi-features. To address these limitations, we present a novel model for automotive intrusion detection by spatial-temporal correlation features of in-vehicle communication traffic (STC-IDS). Specifically, the proposed model exploits an encoding-detection architecture. In the encoder part, spatial and temporal relations are encoded simultaneously. To strengthen the relationship between features, the attention-based convolution network still captures spatial and channel features to increase the receptive field, while attention-LSTM build important relationships from previous time series or crucial bytes. The encoded information is then passed to the detector for generating forceful spatial-temporal attention features and enabling anomaly classification. In particular, single-frame and multi-frame models are constructed to present different advantages respectively. Under automatic hyper-parameter selection based on Bayesian optimization, the model is trained to attain the best performance. Extensive empirical studies based on a real-world vehicle attack dataset demonstrate that STC-IDS has outperformed baseline methods and cables fewer false-positive rates while maintaining efficiency.
TerrainMesh: Metric-Semantic Terrain Reconstruction from Aerial Images Using Joint 2D-3D Learning
- Authors: Qiaojun Feng, Nikolay Atanasov
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2204.10993
- Pdf link: https://arxiv.org/pdf/2204.10993
- Abstract This paper considers outdoor terrain mapping using RGB images obtained from an aerial vehicle. While feature-based localization and mapping techniques deliver real-time vehicle odometry and sparse keypoint depth reconstruction, a dense model of the environment geometry and semantics (vegetation, buildings, etc.) is usually recovered offline with significant computation and storage. This paper develops a joint 2D-3D learning approach to reconstruct a local metric-semantic mesh at each camera keyframe maintained by a visual odometry algorithm. Given the estimated camera trajectory, the local meshes can be assembled into a global environment model to capture the terrain topology and semantics during online operation. A local mesh is reconstructed using an initialization and refinement stage. In the initialization stage, we estimate the mesh vertex elevation by solving a least squares problem relating the vertex barycentric coordinates to the sparse keypoint depth measurements. In the refinement stage, we associate 2D image and semantic features with the 3D mesh vertices using camera projection and apply graph convolution to refine the mesh vertex spatial coordinates and semantic features based on joint 2D and 3D supervision. Quantitative and qualitative evaluation using real aerial images show the potential of our method to support environmental monitoring and surveillance applications.
Exploring Negatives in Contrastive Learning for Unpaired Image-to-Image Translation
- Authors: Yupei Lin, Sen Zhang, Tianshui Chen, Yongyi Lu, Guangping Li, Yukai Shi
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2204.11018
- Pdf link: https://arxiv.org/pdf/2204.11018
- Abstract Unpaired image-to-image translation aims to find a mapping between the source domain and the target domain. To alleviate the problem of the lack of supervised labels for the source images, cycle-consistency based methods have been proposed for image structure preservation by assuming a reversible relationship between unpaired images. However, this assumption only uses limited correspondence between image pairs. Recently, contrastive learning (CL) has been used to further investigate the image correspondence in unpaired image translation by using patch-based positive/negative learning. Patch-based contrastive routines obtain the positives by self-similarity computation and recognize the rest patches as negatives. This flexible learning paradigm obtains auxiliary contextualized information at a low cost. As the negatives own an impressive sample number, with curiosity, we make an investigation based on a question: are all negatives necessary for feature contrastive learning? Unlike previous CL approaches that use negatives as much as possible, in this paper, we study the negatives from an information-theoretic perspective and introduce a new negative Pruning technology for Unpaired image-to-image Translation (PUT) by sparsifying and ranking the patches. The proposed algorithm is efficient, flexible and enables the model to learn essential information between corresponding patches stably. By putting quality over quantity, only a few negative patches are required to achieve better results. Lastly, we validate the superiority, stability, and versatility of our model through comparative experiments.
Indoor simultaneous localization and mapping based on fringe projection profilometry
- Authors: Yang Zhao, Kai Zhang, Haotian Yu, Yi Zhang, Dongliang Zheng, Jing Han
- Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11020
- Pdf link: https://arxiv.org/pdf/2204.11020
- Abstract Simultaneous Localization and Mapping (SLAM) plays an important role in outdoor and indoor applications ranging from autonomous driving to indoor robotics. Outdoor SLAM has been widely used with the assistance of LiDAR or GPS. For indoor applications, the LiDAR technique does not satisfy the accuracy requirement and the GPS signals will be lost. An accurate and efficient scene sensing technique is required for indoor SLAM. As the most promising 3D sensing technique, the opportunities for indoor SLAM with fringe projection profilometry (FPP) systems are obvious, but methods to date have not fully leveraged the accuracy and speed of sensing that such systems offer. In this paper, we propose a novel FPP-based indoor SLAM method based on the coordinate transformation relationship of FPP, where the 2D-to-3D descriptor-assisted is used for mapping and localization. The correspondences generated by matching descriptors are used for fast and accurate mapping, and the transform estimation between the 2D and 3D descriptors is used to localize the sensor. The provided experimental results demonstrate that the proposed indoor SLAM can achieve the localization and mapping accuracy around one millimeter.
Long-Range ICN for the IoT: Exploring a LoRa System Design
- Authors: Peter Kietzmann, Jose Alamos, Dirk Kutscher, Thomas C. Schmidt, Matthias Wählisch
- Subjects: Networking and Internet Architecture (cs.NI)
- Arxiv link: https://arxiv.org/abs/2204.11040
- Pdf link: https://arxiv.org/pdf/2204.11040
- Abstract This paper presents LoRa-ICN, a comprehensive IoT networking system based on a common long-range communication layer (LoRa) combined with Information-Centric Networking (ICN) principles. We have replaced the LoRaWAN MAC layer with an IEEE 802.15.4 Deterministic and Synchronous Multi-Channel Extension (DSME). This multifaceted MAC layer allows for different mappings of ICN message semantics, which we explore to enable new LoRa cenarios. We designed LoRa-ICN from the ground-up to improve reliability and to reduce dependency on centralized components in LoRa IoT scenarios. We have implemented a feature-complete prototype in a common network simulator to validate our approach. Our results show design trade-offs of different mapping alternatives in terms of robustness and efficiency.
2D LiDAR and Camera Fusion Using Motion Cues for Indoor Layout Estimation
- Authors: Jieyu Li, Robert Stevenson
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11202
- Pdf link: https://arxiv.org/pdf/2204.11202
- Abstract This paper presents a novel indoor layout estimation system based on the fusion of 2D LiDAR and intensity camera data. A ground robot explores an indoor space with a single floor and vertical walls, and collects a sequence of intensity images and 2D LiDAR datasets. The LiDAR provides accurate depth information, while the camera captures high-resolution data for semantic interpretation. The alignment of sensor outputs and image segmentation are computed jointly by aligning LiDAR points, as samples of the room contour, to ground-wall boundaries in the images. The alignment problem is decoupled into a top-down view projection and a 2D similarity transformation estimation, which can be solved according to the vertical vanishing point and motion of two sensors. The recursive random sample consensus algorithm is implemented to generate, evaluate and optimize multiple hypotheses with the sequential measurements. The system allows jointly analyzing the geometric interpretation from different sensors without offline calibration. The ambiguity in images for ground-wall boundary extraction is removed with the assistance of LiDAR observations, which improves the accuracy of semantic segmentation. The localization and mapping is refined using the fused data, which enables the system to work reliably in scenes with low texture or low geometric features.
Broad Recommender System: An Efficient Nonlinear Collaborative Filtering Approach
- Authors: Ling Huang, Can-Rong Guan, Zhen-Wei Huang, Yuefang Gao, Yingjie Kuang, Chang-Dong Wang, C. L. Philip Chen
- Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2204.11602
- Pdf link: https://arxiv.org/pdf/2204.11602
- Abstract Recently, Deep Neural Networks (DNNs) have been widely introduced into Collaborative Filtering (CF) to produce more accurate recommendation results due to their capability of capturing the complex nonlinear relationships between items and users.However, the DNNs-based models usually suffer from high computational complexity, i.e., consuming very long training time and storing huge amount of trainable parameters. To address these problems, we propose a new broad recommender system called Broad Collaborative Filtering (BroadCF), which is an efficient nonlinear collaborative filtering approach. Instead of DNNs, Broad Learning System (BLS) is used as a mapping function to learn the complex nonlinear relationships between users and items, which can avoid the above issues while achieving very satisfactory recommendation performance. However, it is not feasible to directly feed the original rating data into BLS. To this end, we propose a user-item rating collaborative vector preprocessing procedure to generate low-dimensional user-item input data, which is able to harness quality judgments of the most similar users/items. Extensive experiments conducted on seven benchmark datasets have confirmed the effectiveness of the proposed BroadCF algorithm
Sustainability in Software Architecture: A Systematic Mapping Study
- Authors: Vasilios Andrikopoulos, Rares-Dorian Boza, Carlos Perales, Patricia Lago
- Subjects: Software Engineering (cs.SE)
- Arxiv link: https://arxiv.org/abs/2204.11657
- Pdf link: https://arxiv.org/pdf/2204.11657
- Abstract Sustainability is an increasingly-studied topic in software engineering in general, and in software architecture in particular. There are already a number of secondary studies addressing sustainability in software engineering, but no such study focusing explicitly on software architecture. This work aims to fill this gap by conducting a systematic mapping study on the intersection between sustainability and software architecture research with the intention of (i) reflecting on the current state of the art, and (ii) identifying the needs for further research. Our results show that, overall, existing works have focused disproportionately on specific aspects of sustainability, and in particular on the most technical and "inward facing" ones. This comes at the expense of the holistic perspective required to address a multi-faceted concern such as sustainability. Furthermore, more reflection-oriented research works, and better coverage of the activities in the architecting life cycle are required to further the maturity of the area. Based on our findings we then propose a research agenda for sustainability-aware software architecture.
Tac2Pose: Tactile Object Pose Estimation from the First Touch
- Authors: Maria Bauza, Antonia Bronars, Alberto Rodriguez
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2204.11701
- Pdf link: https://arxiv.org/pdf/2204.11701
- Abstract In this paper, we present Tac2Pose, an object-specific approach to tactile pose estimation from the first touch for known objects. Given the object geometry, we learn a tailored perception model in simulation that estimates a probability distribution over possible object poses given a tactile observation. To do so, we simulate the contact shapes that a dense set of object poses would produce on the sensor. Then, given a new contact shape obtained from the sensor, we match it against the pre-computed set using an object-specific embedding learned using contrastive learning. We obtain contact shapes from the sensor with an object-agnostic calibration step that maps RGB tactile observations to binary contact shapes. This mapping, which can be reused across object and sensor instances, is the only step trained with real sensor data. This results in a perception model that localizes objects from the first real tactile observation. Importantly, it produces pose distributions and can incorporate additional pose constraints coming from other perception systems, contacts, or priors. We provide quantitative results for 20 objects. Tac2Pose provides high accuracy pose estimations from distinctive tactile observations while regressing meaningful pose distributions to account for those contact shapes that could result from different object poses. We also test Tac2Pose on object models reconstructed from a 3D scanner, to evaluate the robustness to uncertainty in the object model. Finally, we demonstrate the advantages of Tac2Pose compared with three baseline methods for tactile pose estimation: directly regressing the object pose with a neural network, matching an observed contact to a set of possible contacts using a standard classification neural network, and direct pixel comparison of an observed contact with a set of possible contacts. Website: this http URL
Analyze, Debug, Optimize: Real-Time Tracing for Perception and Mapping Systems in ROS 2
- Authors: Pierre-Yves Lajoie, Christophe Bédard, Giovanni Beltrame
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2204.11778
- Pdf link: https://arxiv.org/pdf/2204.11778
- Abstract Perception and mapping systems are among the most computationally, memory, and bandwidth intensive software components in robotics. Therefore, analysis, debugging, and optimization are crucial to improve perception systems performance in real-time applications. However, standard approaches often depict a partial picture of the actual performance. Fortunately, instrumentation and tracing offer a great opportunity for detailed performance analysis of real-time systems. In this paper, we show how our novel open-source tracing tools and techniques for ROS 2 enable us to identify delays, bottlenecks and critical paths inside centralized, or distributed, perception and mapping systems.
Online Deep Learning from Doubly-Streaming Data
- Authors: Heng Lian, John Scovil Atwood, Bojian Hou, Jian Wu, Yi He
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2204.11793
- Pdf link: https://arxiv.org/pdf/2204.11793
- Abstract This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. The challenges of this problem are two folds: 1) Data samples ceaselessly flowing in may carry shifted patterns over time, requiring learners to update hence adapt on-the-fly. 2) Newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A plausible idea to overcome the challenges is to establish relationship between the pre-and-post evolving feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional media streams with complex feature interplay, which suffers an tradeoff between onlineness (biasing shallow learners) and expressiveness(requiring deep learners). Motivated by this, we propose a novel OLD^3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building intermediate feature mapping relationship. A key trait of OLD^3S is to treat the model capacity as a learnable semantics, yields optimal model depth and parameters jointly, in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analyses and empirical studies substantiate the viability and effectiveness of our proposal.
Keyword: localization
Deep Reinforcement Learning-based Radio Resource Allocation and Beam Management under Location Uncertainty in 5G mmWave Networks
- Authors: Yujie Yao, Hao Zhou, Melike Erol-Kantarci
- Subjects: Systems and Control (eess.SY)
- Arxiv link: https://arxiv.org/abs/2204.10984
- Pdf link: https://arxiv.org/pdf/2204.10984
- Abstract Millimeter Wave (mmWave) is an important part of 5G new radio (NR), in which highly directional beams are adapted to compensate for the substantial propagation loss based on UE locations. However, the location information may have some errors such as GPS errors. In any case, some uncertainty, and localization error is unavoidable in most settings. Applying these distorted locations for clustering will increase the error of beam management. Meanwhile, the traffic demand may change dynamically in the wireless environment. Therefore, a scheme that can handle both the uncertainty of localization and dynamic radio resource allocation is needed. In this paper, we propose a UK-means-based clustering and deep reinforcement learning-based resource allocation algorithm (UK-DRL) for radio resource allocation and beam management in 5G mmWave networks. We first apply UK-means as the clustering algorithm to mitigate the localization uncertainty, then deep reinforcement learning (DRL) is adopted to dynamically allocate radio resources. Finally, we compare the UK-DRL with K-means-based clustering and DRL-based resource allocation algorithm (K-DRL), the simulations show that our proposed UK-DRL-based method achieves 150% higher throughput and 61.5% lower delay compared with K-DRL when traffic load is 4Mbps.
TerrainMesh: Metric-Semantic Terrain Reconstruction from Aerial Images Using Joint 2D-3D Learning
- Authors: Qiaojun Feng, Nikolay Atanasov
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2204.10993
- Pdf link: https://arxiv.org/pdf/2204.10993
- Abstract This paper considers outdoor terrain mapping using RGB images obtained from an aerial vehicle. While feature-based localization and mapping techniques deliver real-time vehicle odometry and sparse keypoint depth reconstruction, a dense model of the environment geometry and semantics (vegetation, buildings, etc.) is usually recovered offline with significant computation and storage. This paper develops a joint 2D-3D learning approach to reconstruct a local metric-semantic mesh at each camera keyframe maintained by a visual odometry algorithm. Given the estimated camera trajectory, the local meshes can be assembled into a global environment model to capture the terrain topology and semantics during online operation. A local mesh is reconstructed using an initialization and refinement stage. In the initialization stage, we estimate the mesh vertex elevation by solving a least squares problem relating the vertex barycentric coordinates to the sparse keypoint depth measurements. In the refinement stage, we associate 2D image and semantic features with the 3D mesh vertices using camera projection and apply graph convolution to refine the mesh vertex spatial coordinates and semantic features based on joint 2D and 3D supervision. Quantitative and qualitative evaluation using real aerial images show the potential of our method to support environmental monitoring and surveillance applications.
Discriminative Feature Learning Framework with Gradient Preference for Anomaly Detection
- Authors: Muhao Xu, Xueying Zhou, Xizhan Gao, WeiKai He, Sijie Niu
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2204.11014
- Pdf link: https://arxiv.org/pdf/2204.11014
- Abstract Unsupervised representation learning has been extensively employed in anomaly detection, achieving impressive performance. Extracting valuable feature vectors that can remarkably improve the performance of anomaly detection are essential in unsupervised representation learning. To this end, we propose a novel discriminative feature learning framework with gradient preference for anomaly detection. Specifically, we firstly design a gradient preference based selector to store powerful feature points in space and then construct a feature repository, which alleviate the interference of redundant feature vectors and improve inference efficiency. To overcome the looseness of feature vectors, secondly, we present a discriminative feature learning with center constrain to map the feature repository to a compact subspace, so that the anomalous samples are more distinguishable from the normal ones. Moreover, our method can be easily extended to anomaly localization. Extensive experiments on popular industrial and medical anomaly detection datasets demonstrate our proposed framework can achieve competitive results in both anomaly detection and localization. More important, our method outperforms the state-of-the-art in few shot anomaly detection.
Indoor simultaneous localization and mapping based on fringe projection profilometry
- Authors: Yang Zhao, Kai Zhang, Haotian Yu, Yi Zhang, Dongliang Zheng, Jing Han
- Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11020
- Pdf link: https://arxiv.org/pdf/2204.11020
- Abstract Simultaneous Localization and Mapping (SLAM) plays an important role in outdoor and indoor applications ranging from autonomous driving to indoor robotics. Outdoor SLAM has been widely used with the assistance of LiDAR or GPS. For indoor applications, the LiDAR technique does not satisfy the accuracy requirement and the GPS signals will be lost. An accurate and efficient scene sensing technique is required for indoor SLAM. As the most promising 3D sensing technique, the opportunities for indoor SLAM with fringe projection profilometry (FPP) systems are obvious, but methods to date have not fully leveraged the accuracy and speed of sensing that such systems offer. In this paper, we propose a novel FPP-based indoor SLAM method based on the coordinate transformation relationship of FPP, where the 2D-to-3D descriptor-assisted is used for mapping and localization. The correspondences generated by matching descriptors are used for fast and accurate mapping, and the transform estimation between the 2D and 3D descriptors is used to localize the sensor. The provided experimental results demonstrate that the proposed indoor SLAM can achieve the localization and mapping accuracy around one millimeter.
2D LiDAR and Camera Fusion Using Motion Cues for Indoor Layout Estimation
- Authors: Jieyu Li, Robert Stevenson
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11202
- Pdf link: https://arxiv.org/pdf/2204.11202
- Abstract This paper presents a novel indoor layout estimation system based on the fusion of 2D LiDAR and intensity camera data. A ground robot explores an indoor space with a single floor and vertical walls, and collects a sequence of intensity images and 2D LiDAR datasets. The LiDAR provides accurate depth information, while the camera captures high-resolution data for semantic interpretation. The alignment of sensor outputs and image segmentation are computed jointly by aligning LiDAR points, as samples of the room contour, to ground-wall boundaries in the images. The alignment problem is decoupled into a top-down view projection and a 2D similarity transformation estimation, which can be solved according to the vertical vanishing point and motion of two sensors. The recursive random sample consensus algorithm is implemented to generate, evaluate and optimize multiple hypotheses with the sequential measurements. The system allows jointly analyzing the geometric interpretation from different sensors without offline calibration. The ambiguity in images for ground-wall boundary extraction is removed with the assistance of LiDAR observations, which improves the accuracy of semantic segmentation. The localization and mapping is refined using the fused data, which enables the system to work reliably in scenes with low texture or low geometric features.
Lesion Localization in OCT by Semi-Supervised Object Detection
- Authors: Yue Wu, Yang Zhou, Jianchun Zhao, Jingyuan Yang, Weihong Yu, Youxin Chen, Xirong Li
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2204.11227
- Pdf link: https://arxiv.org/pdf/2204.11227
- Abstract Over 300 million people worldwide are affected by various retinal diseases. By noninvasive Optical Coherence Tomography (OCT) scans, a number of abnormal structural changes in the retina, namely retinal lesions, can be identified. Automated lesion localization in OCT is thus important for detecting retinal diseases at their early stage. To conquer the lack of manual annotation for deep supervised learning, this paper presents a first study on utilizing semi-supervised object detection (SSOD) for lesion localization in OCT images. To that end, we develop a taxonomy to provide a unified and structured viewpoint of the current SSOD methods, and consequently identify key modules in these methods. To evaluate the influence of these modules in the new task, we build OCT-SS, a new dataset consisting of over 1k expert-labeled OCT B-scan images and over 13k unlabeled B-scans. Extensive experiments on OCT-SS identify Unbiased Teacher (UnT) as the best current SSOD method for lesion localization. Moreover, we improve over this strong baseline, with mAP increased from 49.34 to 50.86.
MLO: Multi-Object Tracking and Lidar Odometry in Dynamic Envirnoment
- Authors: Tingchen Ma, Yongsheng Ou
- Subjects: Robotics (cs.RO)
- Arxiv link: https://arxiv.org/abs/2204.11621
- Pdf link: https://arxiv.org/pdf/2204.11621
- Abstract The SLAM system built on the static scene assumption will introduce significant estimation errors when a large number of moving objects appear in the field of view. Tracking and maintaining semantic objects is beneficial to understand the scene and provide rich decision information for planning and control modules. This paper introduces MLO, a multi-object Lidar odometry which tracks ego-motion and movable objects with only the lidar sensor. First, it achieves information extraction of foreground movable objects, surface road, and static background features based on geometry and object fusion perception module. While robustly estimating ego-motion, it accomplishes multi-object tracking through the least-squares method fused by 3D bounding boxes and geometric point clouds. Then, a continuous 4D semantic object map on the timeline can be created. Our approach is evaluated qualitatively and quantitatively under different scenarios on the public KITTI dataset. The experiment results show that the ego localization accuracy of MLO is better than A-LOAM system in highly dynamic, unstructured, and unknown semantic scenes. Meanwhile, the multi-object tracking method with semantic-geometry fusion also has apparent advantages in accuracy and tracking robustness compared with the single method.
Cryptography Is Not Enough: Relay Attacks on Authenticated GNSS Signals
- Authors: Maryam Motallebighomi, Harshad Sathaye, Mridula Singh, Aanjhan Ranganathan
- Subjects: Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2204.11641
- Pdf link: https://arxiv.org/pdf/2204.11641
- Abstract Civilian-GNSS is vulnerable to signal spoofing attacks, and countermeasures based on cryptographic authentication are being proposed to protect against these attacks. Both Galileo and GPS are currently testing broadcast authentication techniques based on the delayed key disclosure to validate the integrity of navigation messages. These authentication mechanisms have proven secure against record now and replay later attacks, as navigation messages become invalid after keys are released. This work analyzes the security guarantees of cryptographically protected GNSS signals and shows the possibility of spoofing a receiver to an arbitrary location without breaking any cryptographic operation. In contrast to prior work, we demonstrate the ability of an attacker to receive signals close to the victim receiver and generate spoofing signals for a different target location without modifying the navigation message contents. Our strategy exploits the essential common reception and transmission time method used to estimate pseudorange in GNSS receivers, thereby rendering any cryptographic authentication useless. We evaluate our attack on a commercial receiver (ublox M9N) and a software-defined GNSS receiver (GNSS-SDR) using a combination of open-source tools, commercial GNSS signal generators, and software-defined radio hardware platforms. Our results show that it is possible to spoof a victim receiver to locations around 4000 km away from the true location without requiring any high-speed communication networks or modifying the message contents. Through this work, we further highlight the fundamental limitations in securing a broadcast signaling-based localization system even if all communications are cryptographically protected.
Estimation of Reliable Proposal Quality for Temporal Action Detection
- Authors: Junshan Hu, Chaoxu guo, Liansheng Zhuang, Biao Wang, Tiezheng Ge, Yuning Jiang, Houqiang Li
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2204.11695
- Pdf link: https://arxiv.org/pdf/2204.11695
- Abstract Temporal action detection (TAD) aims to locate and recognize the actions in an untrimmed video. Anchor-free methods have made remarkable progress which mainly formulate TAD into two tasks: classification and localization using two separate branches. This paper reveals the temporal misalignment between the two tasks hindering further progress. To address this, we propose a new method that gives insights into moment and region perspectives simultaneously to align the two tasks by acquiring reliable proposal quality. For the moment perspective, Boundary Evaluate Module (BEM) is designed which focuses on local appearance and motion evolvement to estimate boundary quality and adopts a multi-scale manner to deal with varied action durations. For the region perspective, we introduce Region Evaluate Module (REM) which uses a new and efficient sampling method for proposal feature representation containing more contextual information compared with point feature to refine category score and proposal boundary. The proposed Boundary Evaluate Module and Region Evaluate Module (BREM) are generic, and they can be easily integrated with other anchor-free TAD methods to achieve superior performance. In our experiments, BREM is combined with two different frameworks and improves the performance on THUMOS14 by 3.6$%$ and 1.0$%$ respectively, reaching a new state-of-the-art (63.6$%$ average $m$AP). Meanwhile, a competitive result of 36.2% average $m$AP is achieved on ActivityNet-1.3 with the consistent improvement of BREM.