Paper-Daily-Notice icon indicating copy to clipboard operation
Paper-Daily-Notice copied to clipboard

New submissions for Mon, 15 Aug 22

Open zhuhu00 opened this issue 2 years ago • 0 comments

New submissions for Mon, 15 Aug 22

Keyword: SLAM

Handling Constrained Optimization in Factor Graphs for Autonomous Navigation

  • Authors: Barbara Bazzana, Tiziano Guadagnino, Giorgio Grisetti
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2208.06325
  • Pdf link: https://arxiv.org/pdf/2208.06325
  • Abstract Factor graphs are graphical models used to represent a wide variety of problems across robotics, such as Structure from Motion (SfM), Simultaneous Localization and Mapping (SLAM) and calibration. Typically, at their core, they have an optimization problem whose terms only depend on a small subset of variables. Factor graph solvers exploit the locality of problems to drastically reduce the computational time of the Iterative Least-Squares (ILS) methodology. Although extremely powerful, their application is usually limited to unconstrained problems. In this paper, we model constraints over variables within factor graphs by introducing a factor graph version of the method of Lagrange Multipliers. We show the potential of our method by presenting a full navigation stack based on factor graphs. Differently from standard navigation stacks, we can model both optimal control for local planning and localization with factor graphs, and solve the two problems using the standard ILS methodology. We validate our approach in real-world autonomous navigation scenarios, comparing it with the de facto standard navigation stack implemented in ROS. Comparative experiments show that for the application at hand our system outperforms the standard nonlinear programming solver Interior-Point Optimizer (IPOPT) in runtime, while achieving similar solutions.

Keyword: odometry

There is no result

Keyword: livox

There is no result

Keyword: loam

There is no result

Keyword: lidar

There is no result

Keyword: loop detection

There is no result

Keyword: nerf

There is no result

Keyword: mapping

A review of cryptosystems based on multi layer chaotic mappings

  • Authors: Awnon Bhowmik, Emon Hossain, Mahmudul Hasan
  • Subjects: Cryptography and Security (cs.CR)
  • Arxiv link: https://arxiv.org/abs/2208.06002
  • Pdf link: https://arxiv.org/pdf/2208.06002
  • Abstract In recent years, a lot of research has gone into creating multi-layer chaotic mapping-based cryptosystems. Random-like behavior, a continuous broadband power spectrum, and a weak baseline condition dependency are all characteristics of chaotic systems. Chaos could be helpful in the three functional components of compression, encryption, and modulation in a digital communication system. To successfully use chaos theory in cryptography, chaotic maps must be built in such a way that the entropy they produce can provide the necessary confusion and diffusion. A chaotic map is used in the first layer of such cryptosystems to create confusion, and a second chaotic map is used in the second layer to create diffusion and create a ciphertext from a plaintext. A secret key generation mechanism and a key exchange method are frequently left out, and many researchers just assume that these essential components of any effective cryptosystem are always accessible. We review such cryptosystems by using a cryptosystem of our design, in which confusion in plaintext is created using Arnold's Cat Map, and logistic mapping is employed to create sufficient dispersion and ultimately get a matching ciphertext. We also address the development of key exchange protocols and secret key schemes for these cryptosystems, as well as the possible outcomes of using cryptanalysis techniques on such a system.

Layout-Bridging Text-to-Image Synthesis

  • Authors: Jiadong Liang, Wenjie Pei, Feng Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2208.06162
  • Pdf link: https://arxiv.org/pdf/2208.06162
  • Abstract The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typical methods, which seek to model the text-to-image mapping directly, could only capture keywords in the text that indicates common objects or actions but fail to learn their spatial distribution patterns. An effective way to circumvent this limitation is to generate an image layout as guidance, which is attempted by a few methods. Nevertheless, these methods fail to generate practically effective layouts due to the diversity of input text and object location. In this paper we push for effective modeling in both text-to-layout generation and layout-to-image synthesis. Specifically, we formulate the text-to-layout generation as a sequence-to-sequence modeling task, and build our model upon Transformer to learn the spatial relationships between objects by modeling the sequential dependencies between them. In the stage of layout-to-image synthesis, we focus on learning the textual-visual semantic alignment per object in the layout to precisely incorporate the input text into the layout-to-image synthesizing process. To evaluate the quality of generated layout, we design a new metric specifically, dubbed Layout Quality Score, which considers both the absolute distribution errors of bounding boxes in the layout and the mutual spatial relationships between them. Extensive experiments on three datasets demonstrate the superior performance of our method over state-of-the-art methods on both predicting the layout and synthesizing the image from the given text.

Identifying User Profiles Via User Footprints

  • Authors: Yasamin Kowsari
  • Subjects: Social and Information Networks (cs.SI)
  • Arxiv link: https://arxiv.org/abs/2208.06251
  • Pdf link: https://arxiv.org/pdf/2208.06251
  • Abstract User identification has been a major field of research in privacy and security topics. Users might utilize multiple Online Social Networks (OSNs) to access a variety of text, videos, and links, and connect to their friends. Identifying user profiles corresponding to multiple virtual activities of users across social networks is significant for the development of related fields, such as network security, user behavior patterns analysis, and user recommendation systems. In addition, predicting personal attributes based on public content is a challenging topic. In this work, we perform an empirical study and proposed a scheme with considerable performance. In this work, we investigate Reddit, a famous social network for questioning and answering. By considering available personal and non-personal attributes, we discuss our main findings based on mapping the different features such as user activities to a special user profile. we collected a dataset with wide distribution consisting of 5000 samples. To map non-personal attributes to personal attributes, a classification approach based on support vector machines (SVM), Random Forests (RF), and deep belief network has been used. Experimental results demonstrate the effectiveness of the proposed methodology and achieved classification accuracy higher than 89%.

Modeling Task Mapping for Data-intensive Applications in Heterogeneous Systems

  • Authors: Martin Wilhelm, Hanna Geppert, Anna Drewes, Thilo Pionteck
  • Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
  • Arxiv link: https://arxiv.org/abs/2208.06321
  • Pdf link: https://arxiv.org/pdf/2208.06321
  • Abstract We introduce a new model for the task mapping problem to aid in the systematic design of algorithms for heterogeneous systems including, but not limited to, CPUs, GPUs and FPGAs. A special focus is set on the communication between the devices, its influence on parallel execution, as well as on device-specific differences regarding parallelizability and streamability. We show how this model can be utilized in different system design phases and present two novel mixed-integer linear programs to demonstrate the usage of the model.

Handling Constrained Optimization in Factor Graphs for Autonomous Navigation

  • Authors: Barbara Bazzana, Tiziano Guadagnino, Giorgio Grisetti
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2208.06325
  • Pdf link: https://arxiv.org/pdf/2208.06325
  • Abstract Factor graphs are graphical models used to represent a wide variety of problems across robotics, such as Structure from Motion (SfM), Simultaneous Localization and Mapping (SLAM) and calibration. Typically, at their core, they have an optimization problem whose terms only depend on a small subset of variables. Factor graph solvers exploit the locality of problems to drastically reduce the computational time of the Iterative Least-Squares (ILS) methodology. Although extremely powerful, their application is usually limited to unconstrained problems. In this paper, we model constraints over variables within factor graphs by introducing a factor graph version of the method of Lagrange Multipliers. We show the potential of our method by presenting a full navigation stack based on factor graphs. Differently from standard navigation stacks, we can model both optimal control for local planning and localization with factor graphs, and solve the two problems using the standard ILS methodology. We validate our approach in real-world autonomous navigation scenarios, comparing it with the de facto standard navigation stack implemented in ROS. Comparative experiments show that for the application at hand our system outperforms the standard nonlinear programming solver Interior-Point Optimizer (IPOPT) in runtime, while achieving similar solutions.

Keyword: localization

Smartwatch-based ecological momentary assessments for occupant wellness and privacy in buildings

  • Authors: Clayton Miller, Renee Christensen, Jin Kai Leong, Mahmoud Abdelrahman, Federico Tartarini, Matias Quintana, Andre Matthias Müller, Mario Frei
  • Subjects: Human-Computer Interaction (cs.HC)
  • Arxiv link: https://arxiv.org/abs/2208.06080
  • Pdf link: https://arxiv.org/pdf/2208.06080
  • Abstract This paper describes the adaptation of an open-source ecological momentary assessment smart-watch platform with three sets of micro-survey wellness-related questions focused on i) infectious disease (COVID-19) risk perception, ii) privacy and distraction in an office context, and iii) triggers of various movement-related behaviors in buildings. This platform was previously used to collect data for thermal comfort, and this work extends its use to other domains. Several research participants took part in a proof-of-concept experiment by wearing a smartwatch to collect their micro-survey question preferences and perception responses for two of the question sets. Participants were also asked to install an indoor localization app on their phone to detect where precisely in the building they completed the survey. The experiment identified occupant information such as the tendencies for the research participants to prefer privacy in certain spaces and the difference between infectious disease risk perception in naturally versus mechanically ventilated spaces.

Robotic Inspection and Characterization of Subsurface Defects on Concrete Structures Using Impact Sounding

  • Authors: Ejup Hoxha, Jinglun Feng, Diar Sanakov, Ardian Gjinofci, Jizhong Xiao
  • Subjects: Robotics (cs.RO); Signal Processing (eess.SP)
  • Arxiv link: https://arxiv.org/abs/2208.06305
  • Pdf link: https://arxiv.org/pdf/2208.06305
  • Abstract Impact-sounding (IS) and impact-echo (IE) are well-developed non-destructive evaluation (NDE) methods that are widely used for inspections of concrete structures to ensure the safety and sustainability. However, it is a tedious work to collect IS and IE data along grid lines covering a large target area for characterization of subsurface defects. On the other hand, data processing is very complicated that requires domain experts to interpret the results. To address the above problems, we present a novel robotic inspection system named as Impact-Rover to automate the data collection process and introduce data analytics software to visualize the inspection result allowing regular non-professional people to understand. The system consists of three modules: 1) a robotic platform with vertical mobility to collect IS and IE data in hard-to-reach locations, 2) vision-based positioning module that fuses the RGB-D camera, IMU and wheel encoder to estimate the 6-DOF pose of the robot, 3) a data analytics software module for processing the IS data to generate defect maps. The Impact-Rover hosts both IE and IS devices on a sliding mechanism and can perform move-stop-sample operations to collect multiple IS and IE data at adjustable spacing. The robot takes samples much faster than the manual data collection method because it automatically takes the multiple measurements along a straight line and records the locations. This paper focuses on reporting experimental results on IS. We calculate features and use unsupervised learning methods for analyzing the data. By combining the pose generated by our vision-based localization module and the position of the head of the sliding mechanism we can generate maps of possible defects. The results on concrete slabs demonstrate that our impact-sounding system can effectively reveal shallow defects.

Handling Constrained Optimization in Factor Graphs for Autonomous Navigation

  • Authors: Barbara Bazzana, Tiziano Guadagnino, Giorgio Grisetti
  • Subjects: Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2208.06325
  • Pdf link: https://arxiv.org/pdf/2208.06325
  • Abstract Factor graphs are graphical models used to represent a wide variety of problems across robotics, such as Structure from Motion (SfM), Simultaneous Localization and Mapping (SLAM) and calibration. Typically, at their core, they have an optimization problem whose terms only depend on a small subset of variables. Factor graph solvers exploit the locality of problems to drastically reduce the computational time of the Iterative Least-Squares (ILS) methodology. Although extremely powerful, their application is usually limited to unconstrained problems. In this paper, we model constraints over variables within factor graphs by introducing a factor graph version of the method of Lagrange Multipliers. We show the potential of our method by presenting a full navigation stack based on factor graphs. Differently from standard navigation stacks, we can model both optimal control for local planning and localization with factor graphs, and solve the two problems using the standard ILS methodology. We validate our approach in real-world autonomous navigation scenarios, comparing it with the de facto standard navigation stack implemented in ROS. Comparative experiments show that for the application at hand our system outperforms the standard nonlinear programming solver Interior-Point Optimizer (IPOPT) in runtime, while achieving similar solutions.

Keyword: transformer

MILAN: Masked Image Pretraining on Language Assisted Representation

  • Authors: Zejiang Hou, Fei Sun, Yen-Kuang Chen, Yuan Xie, Sun-Yuan Kung
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2208.06049
  • Pdf link: https://arxiv.org/pdf/2208.06049
  • Abstract Self-attention based transformer models have been dominating many computer vision tasks in the past few years. Their superb model qualities heavily depend on the excessively large labeled image datasets. In order to reduce the reliance on large labeled datasets, reconstruction based masked autoencoders are gaining popularity, which learn high quality transferable representations from unlabeled images. For the same purpose, recent weakly supervised image pretraining methods explore language supervision from text captions accompanying the images. In this work, we propose masked image pretraining on language assisted representation, dubbed as MILAN. Instead of predicting raw pixels or low level features, our pretraining objective is to reconstruct the image features with substantial semantic signals that are obtained using caption supervision. Moreover, to accommodate our reconstruction target, we propose a more efficient prompting decoder architecture and a semantic aware mask sampling mechanism, which further advance the transfer performance of the pretrained model. Experimental results demonstrate that MILAN delivers higher accuracy than the previous works. When the masked autoencoder is pretrained and finetuned on ImageNet-1K dataset with an input resolution of 224x224, MILAN achieves a top-1 accuracy of 85.4% on ViTB/16, surpassing previous state-of-the-arts by 1%. In the downstream semantic segmentation task, MILAN achieves 52.7 mIoU using ViT-B/16 backbone on ADE20K dataset, outperforming previous masked pretraining results by 4 points.

Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

  • Authors: Paul Soulos, Sudha Rao, Caitlin Smith, Eric Rosen, Asli Celikyilmaz, R. Thomas McCoy, Yichen Jiang, Coleman Haley, Roland Fernandez, Hamid Palangi, Jianfeng Gao, Paul Smolensky
  • Subjects: Computation and Language (cs.CL)
  • Arxiv link: https://arxiv.org/abs/2208.06061
  • Pdf link: https://arxiv.org/pdf/2208.06061
  • Abstract Machine translation has seen rapid progress with the advent of Transformer-based models. These models have no explicit linguistic structure built into them, yet they may still implicitly learn structured relationships by attending to relevant tokens. We hypothesize that this structural learning could be made more robust by explicitly endowing Transformers with a structural bias, and we investigate two methods for building in such a bias. One method, the TP-Transformer, augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We test these methods on translating from English into morphologically rich languages, Turkish and Inuktitut, and consider both automatic metrics and human evaluations. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset. In sum, structural encoding methods make Transformers more sample-efficient, enabling them to perform better from smaller amounts of data.

Deep is a Luxury We Don't Have

  • Authors: Ahmed Taha, Yen Nhi Truong Vu, Brent Mombourquette, Thomas Paul Matthews, Jason Su, Sadanand Singh
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2208.06066
  • Pdf link: https://arxiv.org/pdf/2208.06066
  • Abstract Medical images come in high resolutions. A high resolution is vital for finding malignant tissues at an early stage. Yet, this resolution presents a challenge in terms of modeling long range dependencies. Shallow transformers eliminate this problem, but they suffer from quadratic complexity. In this paper, we tackle this complexity by leveraging a linear self-attention approximation. Through this approximation, we propose an efficient vision model called HCT that stands for High resolution Convolutional Transformer. HCT brings transformers' merits to high resolution images at a significantly lower cost. We evaluate HCT using a high resolution mammography dataset. HCT is significantly superior to its CNN counterpart. Furthermore, we demonstrate HCT's fitness for medical images by evaluating its effective receptive field.Code available at https://bit.ly/3ykBhhf

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

  • Authors: Chao Fang, Aojun Zhou, Zhongfeng Wang
  • Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
  • Arxiv link: https://arxiv.org/abs/2208.06118
  • Pdf link: https://arxiv.org/pdf/2208.06118
  • Abstract The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due to immense parameters and operations of models. To relieve this burden, exploiting sparsity is an effective approach to accelerate Transformers. Newly emerging Ampere GPUs leverage a 2:4 sparsity pattern to achieve model acceleration, while it can hardly meet the diverse algorithm and hardware constraints when deploying models. By contrast, we propose an algorithm-hardware co-optimized framework to flexibly and efficiently accelerate Transformers by utilizing general N:M sparsity patterns. (1) From algorithm perspective, we propose a sparsity inheritance mechanism along with an inherited dynamic pruning (IDP) method to obtain a series of N:M sparse candidate Transformers rapidly. A model compression scheme is further proposed to significantly reduce the storage requirement for deployment. (2) From hardware perspective, we present a flexible and efficient hardware architecture, namely STA, to achieve significant speedup when deploying N:M sparse Transformers. STA features not only a computing engine unifying both sparse-dense and dense-dense matrix multiplications with high computational efficiency but also a scalable softmax module eliminating the latency from intermediate off-chip data communication. Experimental results show that compared to other methods, N:M sparse Transformers, generated using IDP, achieves an average of 6.7% improvement on accuracy with high training efficiency. Moreover, STA can achieve 14.47x and 11.33x speedup compared to Intel i9-9900X and NVIDIA RTX 2080 Ti, respectively, and perform 2.00-19.47x faster inference than the state-of-the-art FPGA-based accelerators for Transformers.

Layout-Bridging Text-to-Image Synthesis

  • Authors: Jiadong Liang, Wenjie Pei, Feng Lu
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2208.06162
  • Pdf link: https://arxiv.org/pdf/2208.06162
  • Abstract The crux of text-to-image synthesis stems from the difficulty of preserving the cross-modality semantic consistency between the input text and the synthesized image. Typical methods, which seek to model the text-to-image mapping directly, could only capture keywords in the text that indicates common objects or actions but fail to learn their spatial distribution patterns. An effective way to circumvent this limitation is to generate an image layout as guidance, which is attempted by a few methods. Nevertheless, these methods fail to generate practically effective layouts due to the diversity of input text and object location. In this paper we push for effective modeling in both text-to-layout generation and layout-to-image synthesis. Specifically, we formulate the text-to-layout generation as a sequence-to-sequence modeling task, and build our model upon Transformer to learn the spatial relationships between objects by modeling the sequential dependencies between them. In the stage of layout-to-image synthesis, we focus on learning the textual-visual semantic alignment per object in the layout to precisely incorporate the input text into the layout-to-image synthesizing process. To evaluate the quality of generated layout, we design a new metric specifically, dubbed Layout Quality Score, which considers both the absolute distribution errors of bounding boxes in the layout and the mutual spatial relationships between them. Extensive experiments on three datasets demonstrate the superior performance of our method over state-of-the-art methods on both predicting the layout and synthesizing the image from the given text.

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

  • Authors: Zhiliang Peng, Li Dong, Hangbo Bao, Qixiang Ye, Furu Wei
  • Subjects: Computer Vision and Pattern Recognition (cs.CV)
  • Arxiv link: https://arxiv.org/abs/2208.06366
  • Pdf link: https://arxiv.org/pdf/2208.06366
  • Abstract Masked image modeling (MIM) has demonstrated impressive results in self-supervised representation learning by recovering corrupted image patches. However, most methods still operate on low-level image pixels, which hinders the exploitation of high-level semantics for representation models. In this study, we propose to use a semantic-rich visual tokenizer as the reconstruction target for masked prediction, providing a systematic way to promote MIM from pixel-level to semantic-level. Specifically, we introduce vector-quantized knowledge distillation to train the tokenizer, which discretizes a continuous semantic space to compact codes. We then pretrain vision Transformers by predicting the original visual tokens for the masked image patches. Moreover, we encourage the model to explicitly aggregate patch information into a global image representation, which facilities linear probing. Experiments on image classification and semantic segmentation show that our approach outperforms all compared MIM methods. On ImageNet-1K (224 size), the base-size BEiT v2 achieves 85.5% top-1 accuracy for fine-tuning and 80.1% top-1 accuracy for linear probing. The large-size BEiT v2 obtains 87.3% top-1 accuracy for ImageNet-1K (224 size) fine-tuning, and 56.7% mIoU on ADE20K for semantic segmentation. The code and pretrained models are available at https://aka.ms/beit.

Keyword: autonomous driving

Optimizing Anchor-based Detectors for Autonomous Driving Scenes

  • Authors: Xianzhi Du, Wei-Chih Hung, Tsung-Yi Lin
  • Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
  • Arxiv link: https://arxiv.org/abs/2208.06062
  • Pdf link: https://arxiv.org/pdf/2208.06062
  • Abstract This paper summarizes model improvements and inference-time optimizations for the popular anchor-based detectors in the scenes of autonomous driving. Based on the high-performing RCNN-RS and RetinaNet-RS detection frameworks designed for common detection scenes, we study a set of framework improvements to adapt the detectors to better detect small objects in crowd scenes. Then, we propose a model scaling strategy by scaling input resolution and model size to achieve a better speed-accuracy trade-off curve. We evaluate our family of models on the real-time 2D detection track of the Waymo Open Dataset (WOD). Within the 70 ms/frame latency constraint on a V100 GPU, our largest Cascade RCNN-RS model achieves 76.9% AP/L1 and 70.1% AP/L2, attaining the new state-of-the-art on WOD real-time 2D detection. Our fastest RetinaNet-RS model achieves 6.3 ms/frame while maintaining a reasonable detection precision at 50.7% AP/L1 and 42.9% AP/L2.

ANTI-CARLA: An Adversarial Testing Framework for Autonomous Vehicles in CARLA

  • Authors: Shreyas Ramakrishna, Baiting Luo, Christopher Kuhn, Gabor Karsai, Abhishek Dubey
  • Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
  • Arxiv link: https://arxiv.org/abs/2208.06309
  • Pdf link: https://arxiv.org/pdf/2208.06309
  • Abstract Despite recent advances in autonomous driving systems, accidents such as the fatal Uber crash in 2018 show these systems are still susceptible to edge cases. Such systems must be thoroughly tested and validated before being deployed in the real world to avoid such events. Testing in open-world scenarios can be difficult, time-consuming, and expensive. These challenges can be addressed by using driving simulators such as CARLA instead. A key part of such tests is adversarial testing, in which the goal is to find scenarios that lead to failures of the given system. While several independent efforts in testing have been made, a well-established testing framework that enables adversarial testing has yet to be made available for CARLA. We therefore propose ANTI-CARLA, an automated testing framework in CARLA for simulating adversarial weather conditions (e.g., heavy rain) and sensor faults (e.g., camera occlusion) that fail the system. The operating conditions in which a given system should be tested are specified in a scenario description language. The framework offers an efficient search mechanism that searches for adversarial operating conditions that will fail the tested system. In this way, ANTI-CARLA extends the CARLA simulator with the capability of performing adversarial testing on any given driving pipeline. We use ANTI-CARLA to test the driving pipeline trained with Learning By Cheating (LBC) approach. The simulation results demonstrate that ANTI-CARLA can effectively and automatically find a range of failure cases despite LBC reaching an accuracy of 100% in the CARLA benchmark.

zhuhu00 avatar Aug 15 '22 04:08 zhuhu00