Paper-Daily-Notice icon indicating copy to clipboard operation
Paper-Daily-Notice copied to clipboard

New submissions for Tue, 29 Mar 22

Open zhuhu00 opened this issue 2 years ago • 0 comments

Keyword: SLAM

FD-SLAM: 3-D Reconstruction Using Features and Dense Matching

  • Authors: Xingrui Yang, Yuhang Ming, Zhaopeng Cui, Andrew Calway
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2203.13861
  • Pdf link: https://arxiv.org/pdf/2203.13861
  • Abstract It is well known that visual SLAM systems based on dense matching are locally accurate but are also susceptible to long-term drift and map corruption. In contrast, feature matching methods can achieve greater long-term consistency but can suffer from inaccurate local pose estimation when feature information is sparse. Based on these observations, we propose an RGB-D SLAM system that leverages the advantages of both approaches: using dense frame-to-model odometry to build accurate sub-maps and on-the-fly feature-based matching across sub-maps for global map optimisation. In addition, we incorporate a learning-based loop closure component based on 3-D features which further stabilises map building. We have evaluated the approach on indoor sequences from public datasets, and the results show that it performs on par or better than state-of-the-art systems in terms of map reconstruction quality and pose estimation. The approach can also scale to large scenes where other systems often fail.

Spectral Measurement Sparsification for Pose-Graph SLAM

  • Authors: Kevin J. Doherty, David M. Rosen, John J. Leonard
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2203.13897
  • Pdf link: https://arxiv.org/pdf/2203.13897
  • Abstract Simultaneous localization and mapping (SLAM) is a critical capability in autonomous navigation, but in order to scale SLAM to the setting of "lifelong" SLAM, particularly under memory or computation constraints, a robot must be able to determine what information should be retained and what can safely be forgotten. In graph-based SLAM, the number of edges (measurements) in a pose graph determines both the memory requirements of storing a robot's observations and the computational expense of algorithms deployed for performing state estimation using those observations; both of which can grow unbounded during long-term navigation. To address this, we propose a spectral approach for pose graph sparsification which maximizes the algebraic connectivity of the sparsified measurement graphs, a key quantity which has been shown to control the estimation error of pose graph SLAM solutions. Our algorithm, MAC (for "maximizing algebraic connectivity"), which is based on convex relaxation, is simple and computationally inexpensive, and admits formal post hoc performance guarantees on the quality of the solutions it provides. In experiments on benchmark pose-graph SLAM datasets, we show that our approach quickly produces high-quality sparsification results which retain the connectivity of the graph and, in turn, the quality of corresponding SLAM solutions, as compared to a baseline approach which does not consider graph connectivity.

Are High-Resolution Event Cameras Really Needed?

  • Authors: Daniel Gehrig, Davide Scaramuzza
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2203.14672
  • Pdf link: https://arxiv.org/pdf/2203.14672
  • Abstract Due to their outstanding properties in challenging conditions, event cameras have become indispensable in a wide range of applications, ranging from automotive, computational photography, and SLAM. However, as further improvements are made to the sensor design, modern event cameras are trending toward higher and higher sensor resolutions, which result in higher bandwidth and computational requirements on downstream tasks. Despite this trend, the benefits of using high-resolution event cameras to solve standard computer vision tasks are still not clear. In this work, we report the surprising discovery that, in low-illumination conditions and at high speeds, low-resolution cameras can outperform high-resolution ones, while requiring a significantly lower bandwidth. We provide both empirical and theoretical evidence for this claim, which indicates that high-resolution event cameras exhibit higher per-pixel event rates, leading to higher temporal noise in low-illumination conditions and at high speeds. As a result, in most cases, high-resolution event cameras show a lower task performance, compared to lower resolution sensors in these conditions. We empirically validate our findings across several tasks, namely image reconstruction, optical flow estimation, and camera pose tracking, both on synthetic and real data. We believe that these findings will provide important guidelines for future trends in event camera development.

Keyword: Visual inertial

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: Visual inertial odometry

There is no result

Keyword: lidar

How Do We Fail? Stress Testing Perception in Autonomous Vehicles

  • Authors: Harrison Delecki, Masha Itkina, Bernard Lange, Ransalu Senanayake, Mykel J. Kochenderfer
  • Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2203.14155
  • Pdf link: https://arxiv.org/pdf/2203.14155
  • Abstract Autonomous vehicles (AVs) rely on environment perception and behavior prediction to reason about agents in their surroundings. These perception systems must be robust to adverse weather such as rain, fog, and snow. However, validation of these systems is challenging due to their complexity and dependence on observation histories. This paper presents a method for characterizing failures of LiDAR-based perception systems for AVs in adverse weather conditions. We develop a methodology based in reinforcement learning to find likely failures in object tracking and trajectory prediction due to sequences of disturbances. We apply disturbances using a physics-based data augmentation technique for simulating LiDAR point clouds in adverse weather conditions. Experiments performed across a wide range of driving scenarios from a real-world driving dataset show that our proposed approach finds high likelihood failures with smaller input disturbances compared to baselines while remaining computationally tractable. Identified failures can inform future development of robust perception systems for AVs.

LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point Clouds

  • Authors: Jialian Li, Jingyi Zhang, Zhiyong Wang, Siqi Shen, Chenglu Wen, Yuexin Ma, Lan Xu, Jingyi Yu, Cheng Wang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2203.14698
  • Pdf link: https://arxiv.org/pdf/2203.14698
  • Abstract Existing motion capture datasets are largely short-range and cannot yet fit the need of long-range applications. We propose LiDARHuman26M, a new human motion capture dataset captured by LiDAR at a much longer range to overcome this limitation. Our dataset also includes the ground truth human motions acquired by the IMU system and the synchronous RGB images. We further present a strong baseline method, LiDARCap, for LiDAR point cloud human motion capture. Specifically, we first utilize PointNet++ to encode features of points and then employ the inverse kinematics solver and SMPL optimizer to regress the pose through aggregating the temporally encoded features hierarchically. Quantitative and qualitative experiments show that our method outperforms the techniques based only on RGB images. Ablation experiments demonstrate that our dataset is challenging and worthy of further research. Finally, the experiments on the KITTI Dataset and the Waymo Open Dataset show that our method can be generalized to different LiDAR sensor settings.

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

  • Authors: Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2203.14956
  • Pdf link: https://arxiv.org/pdf/2203.14956
  • Abstract In this paper, we propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection. In many real-world applications, the LiDAR points used by mass-produced robots and vehicles usually have fewer beams than that in large-scale public datasets. Moreover, as the LiDARs are upgraded to other product models with different beam amount, it becomes challenging to utilize the labeled data captured by previous versions' high-resolution sensors. Despite the recent progress on domain adaptive 3D detection, most methods struggle to eliminate the beam-induced domain gap. We find that it is essential to align the point cloud density of the source domain with that of the target domain during the training process. Inspired by this discovery, we propose a progressive framework to mitigate the beam-induced domain shift. In each iteration, we first generate low-beam pseudo LiDAR by downsampling the high-beam point clouds. Then the teacher-student framework is employed to distill rich information from the data with more beams. Extensive experiments on Waymo, nuScenes and KITTI datasets with three different LiDAR-based detectors demonstrate the effectiveness of our LiDAR Distillation. Notably, our approach does not increase any additional computation cost for inference.

Keyword: loop detection

There is no result

Keyword: autonomous driving

TridentNetV2: Lightweight Graphical Global Plan Representations for Dynamic Trajectory Generation

  • Authors: David Paz, Hao Xiang, Andrew Liang, Henrik I. Christensen
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2203.14019
  • Pdf link: https://arxiv.org/pdf/2203.14019
  • Abstract We present a framework for dynamic trajectory generation for autonomous navigation, which does not rely on HD maps as the underlying representation. High Definition (HD) maps have become a key component in most autonomous driving frameworks, which include complete road network information annotated at a centimeter-level that include traversable waypoints, lane information, and traffic signals. Instead, the presented approach models the distributions of feasible ego-centric trajectories in real-time given a nominal graph-based global plan and a lightweight scene representation. By embedding contextual information, such as crosswalks, stop signs, and traffic signals, our approach achieves low errors across multiple urban navigation datasets that include diverse intersection maneuvers, while maintaining real-time performance and reducing network complexity. Underlying datasets introduced are available online.

Keyword: mapping

Spectral Measurement Sparsification for Pose-Graph SLAM

  • Authors: Kevin J. Doherty, David M. Rosen, John J. Leonard
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2203.13897
  • Pdf link: https://arxiv.org/pdf/2203.13897
  • Abstract Simultaneous localization and mapping (SLAM) is a critical capability in autonomous navigation, but in order to scale SLAM to the setting of "lifelong" SLAM, particularly under memory or computation constraints, a robot must be able to determine what information should be retained and what can safely be forgotten. In graph-based SLAM, the number of edges (measurements) in a pose graph determines both the memory requirements of storing a robot's observations and the computational expense of algorithms deployed for performing state estimation using those observations; both of which can grow unbounded during long-term navigation. To address this, we propose a spectral approach for pose graph sparsification which maximizes the algebraic connectivity of the sparsified measurement graphs, a key quantity which has been shown to control the estimation error of pose graph SLAM solutions. Our algorithm, MAC (for "maximizing algebraic connectivity"), which is based on convex relaxation, is simple and computationally inexpensive, and admits formal post hoc performance guarantees on the quality of the solutions it provides. In experiments on benchmark pose-graph SLAM datasets, we show that our approach quickly produces high-quality sparsification results which retain the connectivity of the graph and, in turn, the quality of corresponding SLAM solutions, as compared to a baseline approach which does not consider graph connectivity.

Collaborative Intelligent Reflecting Surface Networks with Multi-Agent Reinforcement Learning

  • Authors: Jie Zhang, Jun Li, Yijin Zhang, Qingqing Wu, Xiongwei Wu, Feng Shu, Shi Jin, Wen Chen
  • Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2203.14152
  • Pdf link: https://arxiv.org/pdf/2203.14152
  • Abstract Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting. Aiming to maximize the long-term average achievable system rate, an optimization problem is formulated by jointly designing the transmit beamforming at the base station (BS) and discrete phase shift beamforming at the IRSs, with the constraints on transmit power, user data rate requirement and IRS energy buffer size. Considering time-varying channels and stochastic arrivals of energy harvested by the IRSs, we first formulate the problem as a Markov decision process (MDP) and then develop a novel multi-agent Q-mix (MAQ) framework with two layers to decouple the optimization parameters. The higher layer is for optimizing phase shift resolutions, and the lower one is for phase shift beamforming and power allocation. Since the phase shift optimization is an integer programming problem with a large-scale action space, we improve MAQ by incorporating the Wolpertinger method, namely, MAQ-WP algorithm to achieve a sub-optimality with reduced dimensions of action space. In addition, as MAQ-WP is still of high complexity to achieve good performance, we propose a policy gradient-based MAQ algorithm, namely, MAQ-PG, by mapping the discrete phase shift actions into a continuous space at the cost of a slight performance loss. Simulation results demonstrate that the proposed MAQ-WP and MAQ-PG algorithms can converge faster and achieve data rate improvements of 10.7% and 8.8% over the conventional multi-agent DDPG, respectively.

UNMAS: Multi-Agent Reinforcement Learning for Unshaped Cooperative Scenarios

  • Authors: Jiajun Chai, Weifan Li, Yuanheng Zhu, Dongbin Zhao, Zhe Ma, Kewu Sun, Jishiyu Ding
  • Subjects: Multiagent Systems (cs.MA)
  • Arxiv link: https://arxiv.org/abs/2203.14477
  • Pdf link: https://arxiv.org/pdf/2203.14477
  • Abstract Multi-agent reinforcement learning methods such as VDN, QMIX, and QTRAN that adopt centralized training with decentralized execution (CTDE) framework have shown promising results in cooperation and competition. However, in some multi-agent scenarios, the number of agents and the size of action set actually vary over time. We call these unshaped scenarios, and the methods mentioned above fail in performing satisfyingly. In this paper, we propose a new method called Unshaped Networks for Multi-Agent Systems (UNMAS) that adapts to the number and size changes in multi-agent systems. We propose the self-weighting mixing network to factorize the joint action-value. Its adaption to the change in agent number is attributed to the nonlinear mapping from each-agent Q value to the joint action-value with individual weights. Besides, in order to address the change in action set, each agent constructs an individual action-value network that is composed of two streams to evaluate the constant environment-oriented subset and the varying unit-oriented subset. We evaluate UNMAS on various StarCraft II micro-management scenarios and compare the results with several state-of-the-art MARL algorithms. The superiority of UNMAS is demonstrated by its highest winning rates especially on the most difficult scenario 3s5z_vs_3s6z. The agents learn to perform effectively cooperative behaviors while other MARL algorithms fail in. Animated demonstrations and source code are provided in https://sites.google.com/view/unmas.

Dynamic state and parameter estimation in multi-machine power systems - Experimental demonstration using real-world PMU-measurements

  • Authors: Nicolai Lorenz-Meyer, René Suchantke, Johannes Schiffer
  • Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2203.14623
  • Pdf link: https://arxiv.org/pdf/2203.14623
  • Abstract Dynamic state and parameter estimation (DSE) plays a key role for reliably monitoring and operating future, power-electronics-dominated power systems. While DSE is a very active research field, experimental applications of proposed algorithms to real-world systems remain scarce. This motivates the present paper, in which we demonstrate the effectiveness of a DSE algorithm previously presented by parts of the authors with real-world data collected by a Phasor Measurement Unit (PMU) at a substation close to a power plant within the extra-high voltage grid of Germany. To this end, at first we derive a suitable mapping of the real-world PMU-measurements recorded at a substation close to the power plant to the terminal bus of the power plants' synchronous generator (SG). This mapping considers the high-voltage (HV) transmission line, the tap-changing transformer and the auxiliary system of the power plant. Next, we introduce several practically motivated extensions to the estimation algorithm, which significantly improve its practical performance with real-world measurements. Finally, we successfully validate the algorithm experimentally in an auto- as well as a cross-validation.

Isomorphic Cross-lingual Embeddings for Low-Resource Languages

  • Authors: Sonal Sannigrahi, Jesse Read
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2203.14632
  • Pdf link: https://arxiv.org/pdf/2203.14632
  • Abstract Cross-Lingual Word Embeddings (CLWEs) are a key component to transfer linguistic information learnt from higher-resource settings into lower-resource ones. Recent research in cross-lingual representation learning has focused on offline mapping approaches due to their simplicity, computational efficacy, and ability to work with minimal parallel resources. However, they crucially depend on the assumption of embedding spaces being approximately isomorphic i.e. sharing similar geometric structure, which does not hold in practice, leading to poorer performance on low-resource and distant language pairs. In this paper, we introduce a framework to learn CLWEs, without assuming isometry, for low-resource pairs via joint exploitation of a related higher-resource language. In our work, we first pre-align the low-resource and related language embedding spaces using offline methods to mitigate the assumption of isometry. Following this, we use joint training methods to develops CLWEs for the related language and the target embed-ding space. Finally, we remap the pre-aligned low-resource space and the target space to generate the final CLWEs. We show consistent gains over current methods in both quality and degree of isomorphism, as measured by bilingual lexicon induction (BLI) and eigenvalue similarity respectively, across several language pairs: {Nepali, Finnish, Romanian, Gujarati, Hungarian}-English. Lastly, our analysis also points to the relatedness as well as the amount of related language data available as being key factors in determining the quality of embeddings achieved.

Using Machine Learning to generate an open-access cropland map from satellite images time series in the Indian Himalayan Region

  • Authors: Danya Li, Joaquin Gajardo, Michele Volpi, Thijs Defraeye
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2203.14673
  • Pdf link: https://arxiv.org/pdf/2203.14673
  • Abstract Crop maps are crucial for agricultural monitoring and food management and can additionally support domain-specific applications, such as setting cold supply chain infrastructure in developing countries. Machine learning (ML) models, combined with freely-available satellite imagery, can be used to produce cost-effective and high spatial-resolution crop maps. However, accessing ground truth data for supervised learning is especially challenging in developing countries due to factors such as smallholding and fragmented geography, which often results in a lack of crop type maps or even reliable cropland maps. Our area of interest for this study lies in Himachal Pradesh, India, where we aim at producing an open-access binary cropland map at 10-meter resolution for the Kullu, Shimla, and Mandi districts. To this end, we developed an ML pipeline that relies on Sentinel-2 satellite images time series. We investigated two pixel-based supervised classifiers, support vector machines (SVM) and random forest (RF), which are used to classify per-pixel time series for binary cropland mapping. The ground truth data used for training, validation and testing was manually annotated from a combination of field survey reference points and visual interpretation of very high resolution (VHR) imagery. We trained and validated the models via spatial cross-validation to account for local spatial autocorrelation and selected the RF model due to overall robustness and lower computational cost. We tested the generalization capability of the chosen model at the pixel level by computing the accuracy, recall, precision, and F1-score on hold-out test sets of each district, achieving an average accuracy for the RF (our best model) of 87%. We used this model to generate a cropland map for three districts of Himachal Pradesh, spanning 14,600 km2, which improves the resolution and quality of existing public maps.

A novel evolutionary-based neuro-fuzzy task scheduling approach to jointly optimize the main design challenges of heterogeneous MPSoCs

  • Authors: Athena Abdi, Armin Salimi-Badr
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2203.14717
  • Pdf link: https://arxiv.org/pdf/2203.14717
  • Abstract In this paper, an online task scheduling and mapping method based on a fuzzy neural network (FNN) learned by an evolutionary multi-objective algorithm (NSGA-II) to jointly optimize the main design challenges of heterogeneous MPSoCs is proposed. In this approach, first, the FNN parameters are trained using an NSGA-II-based optimization engine by considering the main design challenges of MPSoCs including temperature, power consumption, failure rate, and execution time on a training dataset consisting of different application graphs of various sizes. Next, the trained FNN is employed as an online task scheduler to jointly optimize the main design challenges in heterogeneous MPSoCs. Due to the uncertainty in sensor measurements and the difference between computational models and reality, applying the fuzzy neural network is advantageous in online scheduling procedures. The performance of the method is compared with some previous heuristic, meta-heuristic, and rule-based approaches in several experiments. Based on these experiments our proposed method outperforms the related studies in optimizing all design criteria. Its improvement over related heuristic and meta-heuristic approaches are estimated 10.58% in temperature, 9.22% in power consumption, 39.14% in failure rate, and 12.06% in execution time, averagely. Moreover, considering the interpretable nature of the FNN, the frequently fired extracted fuzzy rules of the proposed approach are demonstrated.

Learning to estimate UAV created turbulence from scene structure observed by onboard cameras

  • Authors: Quentin Possamaï, Steeven Janny, Madiha Nadri, Laurent Bako, Christian Wolf
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2203.14726
  • Pdf link: https://arxiv.org/pdf/2203.14726
  • Abstract Controlling UAV flights precisely requires a realistic dynamic model and accurate state estimates from onboard sensors like UAV, GPS and visual observations. Obtaining a precise dynamic model is extremely difficult, as important aerodynamic effects are hard to model, in particular ground effect and other turbulences. While machine learning has been used in the past to estimate UAV created turbulence, this was restricted to flat grounds or diffuse in-flight air turbulences, both without taking into account obstacles. In this work we address the complex problem of estimating in-flight turbulences caused by obstacles, in particular the complex structures in cluttered environments. We learn a mapping from control input and images captured by onboard cameras to turbulence. In a large-scale setting, we train a model over a large number of different simulated photo-realistic environments loaded into the Habitat.AI simulator augmented with a dynamic UAV model and an analytic ground effect model. We transfer the model from simulation to a real environment and evaluate on real UAV flights from the EuRoC-MAV dataset, showing that the model is capable of good sim2real generalization performance. The dataset will be made publicly available upon acceptance.

Neural Estimation and Optimization of Directed Information over Continuous Spaces

  • Authors: Dor Tsur, Ziv Aharoni, Ziv Goldfeld, Haim Permuter
  • Subjects: Information Theory (cs.IT)
  • Arxiv link: https://arxiv.org/abs/2203.14743
  • Pdf link: https://arxiv.org/pdf/2203.14743
  • Abstract This work develops a new method for estimating and optimizing the directed information rate between two jointly stationary and ergodic stochastic processes. Building upon recent advances in machine learning, we propose a recurrent neural network (RNN)-based estimator which is optimized via gradient ascent over the RNN parameters. The estimator does not require prior knowledge of the underlying joint and marginal distributions. The estimator is also readily optimized over continuous input processes realized by a deep generative model. We prove consistency of the proposed estimation and optimization methods and combine them to obtain end-to-end performance guarantees. Applications for channel capacity estimation of continuous channels with memory are explored, and empirical results demonstrating the scalability and accuracy of our method are provided. When the channel is memoryless, we investigate the mapping learned by the optimized input generator.

A Fly in the Ointment: An Empirical Study on the Characteristics of Ethereum Smart Contracts Code Weaknesses and Vulnerabilities

  • Authors: Majd Soud, Grischa Liebel, Mohammad Hamdaqa
  • Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)
  • Arxiv link: https://arxiv.org/abs/2203.14850
  • Pdf link: https://arxiv.org/pdf/2203.14850
  • Abstract Context: Smart contracts are computer programs that are automatically executed on the blockchain. Vulnerabilities in their implementation have led to severe loss of cryptocurrency. Smart contracts become immutable when deployed to the Ethereum blockchain. Therefore, it is essential to understand the nature of vulnerabilities in Ethereum smart contracts to prevent them in the future. Existing classifications exist, but are limited in several ways. Objective: We aim to characterize vulnerabilities in Ethereum smart contracts written in Solidity, and unify existing classifications schemes. Method: We extracted 2143 vulnerabilities from public coding platforms and popular vulnerability databases and categorized them using a card sorting approach. We targeted the Ethereum blockchain in this paper, as it is the first and most popular blockchain to support the deployment of smart contracts, and Solidity as the most widely used language to implement smart contracts. We devised a classification scheme of smart contract vulnerabilities according to their error source and impact. Afterwards, we mapped existing classification schemes to our classification. Results: The resulting classification consists of 11 categories describing the error source of a vulnerability and 13 categories describing potential impacts. Our findings show that the language specific coding and the structural data flow categories are the dominant categories, but that the frequency of occurrence differs substantially between the data sources. Conclusions: Our findings enable researchers to better understand smart contract vulnerabilities by defining various dimensions of the problem and supporting our classification with mappings with literature-based classifications and frequency distributions of the defined categories.

Attributable Visual Similarity Learning

  • Authors: Borui Zhang, Wenzhao Zheng, Jie Zhou, Jiwen Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2203.14932
  • Pdf link: https://arxiv.org/pdf/2203.14932
  • Abstract This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Most existing similarity learning methods exacerbate the unexplainability by mapping each sample to a single point in the embedding space with a distance metric (e.g., Mahalanobis distance, Euclidean distance). Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph and then infer the overall similarity accordingly. Furthermore, we establish a bottom-up similarity construction and top-down similarity inference framework to infer the similarity based on semantic hierarchy consistency. We first identify unreliable higher-level similarity nodes and then correct them using the most coherent adjacent lower-level similarity nodes, which simultaneously preserve traces for similarity attribution. Extensive experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods and verify the interpretability of our framework. Code is available at https://github.com/zbr17/AVSL.

Keyword: localization

Spectral Measurement Sparsification for Pose-Graph SLAM

  • Authors: Kevin J. Doherty, David M. Rosen, John J. Leonard
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2203.13897
  • Pdf link: https://arxiv.org/pdf/2203.13897
  • Abstract Simultaneous localization and mapping (SLAM) is a critical capability in autonomous navigation, but in order to scale SLAM to the setting of "lifelong" SLAM, particularly under memory or computation constraints, a robot must be able to determine what information should be retained and what can safely be forgotten. In graph-based SLAM, the number of edges (measurements) in a pose graph determines both the memory requirements of storing a robot's observations and the computational expense of algorithms deployed for performing state estimation using those observations; both of which can grow unbounded during long-term navigation. To address this, we propose a spectral approach for pose graph sparsification which maximizes the algebraic connectivity of the sparsified measurement graphs, a key quantity which has been shown to control the estimation error of pose graph SLAM solutions. Our algorithm, MAC (for "maximizing algebraic connectivity"), which is based on convex relaxation, is simple and computationally inexpensive, and admits formal post hoc performance guarantees on the quality of the solutions it provides. In experiments on benchmark pose-graph SLAM datasets, we show that our approach quickly produces high-quality sparsification results which retain the connectivity of the graph and, in turn, the quality of corresponding SLAM solutions, as compared to a baseline approach which does not consider graph connectivity.

Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection

  • Authors: Li Yin, Juan M Perez-Rua, Kevin J Liang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2203.13903
  • Pdf link: https://arxiv.org/pdf/2203.13903
  • Abstract We study the challenging incremental few-shot object detection (iFSD) setting. Recently, hypernetwork-based approaches have been studied in the context of continuous and finetune-free iFSD with limited success. We take a closer look at important design choices of such methods, leading to several key improvements and resulting in a more accurate and flexible framework, which we call Sylph. In particular, we demonstrate the effectiveness of decoupling object classification from localization by leveraging a base detector that is pretrained for class-agnostic localization on large-scale dataset. Contrary to what previous results have suggested, we show that with a carefully designed class-conditional hypernetwork, finetune-free iFSD can be highly effective, especially when a large number of base categories with abundant data are available for meta-training, almost approaching alternatives that undergo test-time-training. This result is even more significant considering its many practical advantages: (1) incrementally learning new classes in sequence without additional training, (2) detecting both novel and seen classes in a single pass, and (3) no forgetting of previously seen classes. We benchmark our model on both COCO and LVIS, reporting as high as $17%$ AP on the long-tail rare classes on LVIS, indicating the promise of hypernetwork-based iFSD.

EYNet: Extended YOLO for Airport Detection in Remote Sensing Images

  • Authors: Hengameh Mirhajianmoghadam, Behrouz Bolourian Haghighi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2203.14007
  • Pdf link: https://arxiv.org/pdf/2203.14007
  • Abstract Nowadays, airport detection in remote sensing images has attracted considerable attention due to its strategic role in civilian and military scopes. In particular, uncrewed and operated aerial vehicles must immediately detect safe areas to land in emergencies. The previous schemes suffered from various aspects, including complicated backgrounds, scales, and shapes of the airport. Meanwhile, the rapid action and accuracy of the method are confronted with significant concerns. Hence, this study proposes an effective scheme by extending YOLOV3 and ShearLet transform. In this way, MobileNet and ResNet18, with fewer layers and parameters retrained on a similar dataset, are parallelly trained as base networks. According to airport geometrical characteristics, the ShearLet filters with different scales and directions are considered in the first convolution layers of ResNet18 as a visual attention mechanism. Besides, the major extended in YOLOV3 concerns the detection Sub-Networks with novel structures which boost object expression ability and training efficiency. In addition, novel augmentation and negative mining strategies are presented to significantly increase the localization phase's performance. The experimental results on the DIOR dataset reveal that the framework reliably detects different types of airports in a varied area and acquires robust results in complex scenes compared to traditional YOLOV3 and state-of-the-art schemes.

Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching

  • Authors: Yujiao Shi, Xin Yu, Liu Liu, Dylan Campbell, Piotr Koniusz, Hongdong Li
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2203.14148
  • Pdf link: https://arxiv.org/pdf/2203.14148
  • Abstract We address the problem of ground-to-satellite image geo-localization, that is, estimating the camera latitude, longitude and orientation (azimuth angle) by matching a query image captured at the ground level against a large-scale database with geotagged satellite images. Our prior arts treat the above task as pure image retrieval by selecting the most similar satellite reference image matching the ground-level query image. However, such an approach often produces coarse location estimates because the geotag of the retrieved satellite image only corresponds to the image center while the ground camera can be located at any point within the image. To further consolidate our prior research findings, we present a novel geometry-aware geo-localization method. Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image, once its coarse location and orientation have been determined. Moreover, we propose a new geometry-aware image retrieval pipeline to improve the coarse localization accuracy. Apart from a polar transform in our conference work, this new pipeline also maps satellite image pixels to the ground-level plane in the ground-view via a geometry-constrained projective transform to emphasize informative regions, such as road structures, for cross-view geo-localization. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our newly proposed framework. We also significantly improve the performance of coarse localization results compared to the state-of-the-art in terms of location recalls.

HINT: Hierarchical Neuron Concept Explainer

  • Authors: Andong Wang, Wei-Ning Lee, Xiaojuan Qi
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2203.14196
  • Pdf link: https://arxiv.org/pdf/2203.14196
  • Abstract To interpret deep networks, one main approach is to associate neurons with human-understandable concepts. However, existing methods often ignore the inherent relationships of different concepts (e.g., dog and cat both belong to animals), and thus lose the chance to explain neurons responsible for higher-level concepts (e.g., animal). In this paper, we study hierarchical concepts inspired by the hierarchical cognition process of human beings. To this end, we propose HIerarchical Neuron concepT explainer (HINT) to effectively build bidirectional associations between neurons and hierarchical concepts in a low-cost and scalable manner. HINT enables us to systematically and quantitatively study whether and how the implicit hierarchical relationships of concepts are embedded into neurons, such as identifying collaborative neurons responsible to one concept and multimodal neurons for different concepts, at different semantic levels from concrete concepts (e.g., dog) to more abstract ones (e.g., animal). Finally, we verify the faithfulness of the associations using Weakly Supervised Object Localization, and demonstrate its applicability in various tasks such as discovering saliency regions and explaining adversarial attacks. Code is available on https://github.com/AntonotnaWang/HINT.

Towards physiology-informed data augmentation for EEG-based BCIs

  • Authors: Oleksandr Zlatov, Benjamin Blankertz
  • Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2203.14392
  • Pdf link: https://arxiv.org/pdf/2203.14392
  • Abstract Most EEG-based Brain-Computer Interfaces (BCIs) require a considerable amount of training data to calibrate the classification model, owing to the high variability in the EEG data, which manifests itself between participants, but also within participants from session to session (and, of course, from trial to trial). In general, the more complex the model, the more data for training is needed. We suggest a novel technique for augmenting the training data by generating new data from the data set at hand. Different from existing techniques, our method uses backward and forward projection using source localization and a head model to modify the current source dipoles of the model, thereby generating inter-participant variability in a physiologically meaningful way. In this manuscript, we explain the method and show first preliminary results for participant-independent motor-imagery classification. The accuracy was increased when using the proposed method of data augmentation by 13, 6 and 2 percentage points when using a deep neural network, a shallow neural network and LDA, respectively.

PAEDID: Patch Autoencoder Based Deep Image Decomposition For Pixel-level Defective Region Segmentation

  • Authors: Shancong Mou, Meng Cao, Haoping Bai, Ping Huang, Jianjun Shi, Jiulong Shan
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2203.14457
  • Pdf link: https://arxiv.org/pdf/2203.14457
  • Abstract Unsupervised pixel-level defective region segmentation is an important task in image-based anomaly detection for various industrial applications. The state-of-the-art methods have their own advantages and limitations: matrix-decomposition-based methods are robust to noise but lack complex background image modeling capability; representation-based methods are good at defective region localization but lack accuracy in defective region shape contour extraction; reconstruction-based methods detected defective region match well with the ground truth defective region shape contour but are noisy. To combine the best of both worlds, we present an unsupervised patch autoencoder based deep image decomposition (PAEDID) method for defective region segmentation. In the training stage, we learn the common background as a deep image prior by a patch autoencoder (PAE) network. In the inference stage, we formulate anomaly detection as an image decomposition problem with the deep image prior and domain-specific regularizations. By adopting the proposed approach, the defective regions in the image can be accurately extracted in an unsupervised fashion. We demonstrate the effectiveness of the PAEDID method in simulation studies and an industrial dataset in the case study.

UTIL: An Ultra-wideband Time-difference-of-arrival Indoor Localization Dataset

  • Authors: Wenda Zhao, Abhishek Goudar, Xinyuan Qiao, Angela P. Schoellig
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2203.14471
  • Pdf link: https://arxiv.org/pdf/2203.14471
  • Abstract This paper presents an ultra-wideband (UWB) time-difference-of-arrival (TDOA) dataset collected from a quadrotor for research purposes. The dataset consists of low-level signal information from static experiments and UWB TDOA measurements and additional onboard sensor data from flight experiments on a quadrotor. The data collection process is discussed in detail, including the equipment used, measurement collection procedure, and the calibration of the quadrotor platform. All the data is made available as plain text files and we provide both Matlab and Python scripts to parse and analyze the data. We provide a thorough description of the data format and some pointers on the potential usage of each sub-dataset. The dataset is available for download at https://utiasdsl.github.io/util-uwb-dataset/. We hope this dataset will help researchers develop and compare reliable estimation methods for the emerging UWB TDOA-based indoor localization technology.

CenterLoc3D: Monocular 3D Vehicle Localization Network for Roadside Surveillance Cameras

  • Authors: Tang Xinyao, Song Huansheng, Wang Wei, Zhao Chunhui
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2203.14550
  • Pdf link: https://arxiv.org/pdf/2203.14550
  • Abstract Monocular 3D vehicle localization is an important task in Intelligent Transportation System (ITS) and Cooperative Vehicle Infrastructure System (CVIS), which is usually achieved by monocular 3D vehicle detection. However, depth information cannot be obtained directly by monocular cameras due to the inherent imaging mechanism, resulting in more challenging monocular 3D tasks. Most of the current monocular 3D vehicle detection methods leverage 2D detectors and additional geometric modules, which reduces the efficiency. In this paper, we propose a 3D vehicle localization network CenterLoc3D for roadside monocular cameras, which directly predicts centroid and eight vertexes in image space, and dimension of 3D bounding boxes without 2D detectors. In order to improve the precision of 3D vehicle localization, we propose a weighted-fusion module and a loss with spatial constraints embedding in CenterLoc3D. Firstly, the transformation matrix between 2D image space and 3D world space is solved by camera calibration. Secondly, vehicle type, centroid, eight vertexes and dimension of 3D vehicle bounding boxes are obtained by CenterLoc3D. Finally, centroid in 3D world space can be obtained by camera calibration and CenterLoc3D for 3D vehicle localization. To the best of our knowledge, this is the first application of 3D vehicle localization for roadside monocular cameras. Hence, we also propose a benchmark for this application including dataset (SVLD-3D), annotation tool (LabelImg-3D) and evaluation metrics. Through experimental validation, the proposed method achieves high accuracy and real-time performance.

A 3D Positioning-based Channel Estimation Method for RIS-aided mmWave Communications

  • Authors: Yaoshen Cui, Haifan Yin, Li Tan, Marco Di Renzo
  • Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2203.14636
  • Pdf link: https://arxiv.org/pdf/2203.14636
  • Abstract A fundamental challenge in millimeter-wave (mmWave) communication is the susceptibility to blocking objects. One way to alleviate this problem is the use of reconfigurable intelligent surfaces (RIS). Nevertheless, due to the large number of passive reflecting elements on RIS, channel estimation turns out to be a challenging task. In this paper, we address the channel estimation for RIS-aided mmWave communication systems based on a localization method. The proposed idea consists of exploiting the sparsity of the mmWave channel and the topology of the RIS. In particular, we first propose the concept of reflecting unit set (RUS) to improve the flexibility of RIS. We then propose a novel coplanar maximum likelihood-based (CML) 3D positioning method based on the RUS, and derive the Cramer-Rao lower bound (CRLB) for the positioning method. Furthermore, we develop an efficient positioning-based channel estimation scheme with low computational complexity. Compared to state-of-the-art methods, our proposed method requires less time-frequency resources in channel acquisition as the complexity is independent to the total size of the RIS but depends on the size of the RUSs, which is only a small portion of the RIS. Large performance gains are confirmed in simulations, which proves the effectiveness of the proposed method.

ObjectFormer for Image Manipulation Detection and Localization

  • Authors: Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2203.14681
  • Pdf link: https://arxiv.org/pdf/2203.14681
  • Abstract Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection. In this paper, we propose ObjectFormer to detect and localize image manipulations. To capture subtle manipulation traces that are no longer visible in the RGB domain, we extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. Additionally, we use a set of learnable object prototypes as mid-level representations to model the object-level consistencies among different regions, which are further used to refine patch embeddings to capture the patch-level consistencies. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method, outperforming state-of-the-art tampering detection and localization methods.

zhuhu00 avatar Mar 29 '22 13:03 zhuhu00