arxiv-daily
arxiv-daily copied to clipboard
New submissions for Mon, 10 Oct 22
Keyword: human object interaction
There is no result
Keyword: visual relation detection
There is no result
Keyword: object detection
Synthetic Dataset Generation for Privacy-Preserving Machine Learning
- Authors: Efstathia Soufleri, Gobinda Saha, Kaushik Roy
- Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2210.03205
- Pdf link: https://arxiv.org/pdf/2210.03205
- Abstract Machine Learning (ML) has achieved enormous success in solving a variety of problems in computer vision, speech recognition, object detection, to name a few. The principal reason for this success is the availability of huge datasets for training deep neural networks (DNNs). However, datasets cannot be publicly released if they contain sensitive information such as medical records, and data privacy becomes a major concern. Encryption methods could be a possible solution, however their deployment on ML applications seriously impacts classification accuracy and results in substantial computational overhead. Alternatively, obfuscation techniques could be used, but maintaining a good trade-off between visual privacy and accuracy is challenging. In this paper, we propose a method to generate secure synthetic datasets from the original private datasets. Given a network with Batch Normalization (BN) layers pretrained on the original dataset, we first record the class-wise BN layer statistics. Next, we generate the synthetic dataset by optimizing random noise such that the synthetic data match the layer-wise statistical distribution of original images. We evaluate our method on image classification datasets (CIFAR10, ImageNet) and show that synthetic data can be used in place of the original CIFAR10/ImageNet data for training networks from scratch, producing comparable classification performance. Further, to analyze visual privacy provided by our method, we use Image Quality Metrics and show high degree of visual dissimilarity between the original and synthetic images. Moreover, we show that our proposed method preserves data-privacy under various privacy-leakage attacks including Gradient Matching Attack, Model Memorization Attack, and GAN-based Attack.
Joint Protection Scheme for Deep Neural Network Hardware Accelerators and Models
- Authors: Jingbo Zhou, Xinmiao Zhang
- Subjects: Cryptography and Security (cs.CR)
- Arxiv link: https://arxiv.org/abs/2210.03249
- Pdf link: https://arxiv.org/pdf/2210.03249
- Abstract Deep neural networks (DNNs) are utilized in numerous image processing, object detection, and video analysis tasks and need to be implemented using hardware accelerators to achieve practical speed. Logic locking is one of the most popular methods for preventing chip counterfeiting. Nevertheless, existing logic-locking schemes need to sacrifice the number of input patterns leading to wrong output under incorrect keys to resist the powerful satisfiability (SAT)-attack. Furthermore, DNN model inference is fault-tolerant. Hence, using a wrong key for those SAT-resistant logic-locking schemes may not affect the accuracy of DNNs. This makes the previous SAT-resistant logic-locking scheme ineffective on protecting DNN accelerators. Besides, to prevent DNN models from being illegally used, the models need to be obfuscated by the designers before they are provided to end-users. Previous obfuscation methods either require long time to retrain the model or leak information about the model. This paper proposes a joint protection scheme for DNN hardware accelerators and models. The DNN accelerator is modified using a hardware key (Hkey) and a model key (Mkey). Different from previous logic locking, the Hkey, which is used to protect the accelerator, does not affect the output when it is wrong. As a result, the SAT attack can be effectively resisted. On the other hand, a wrong Hkey leads to substantial increase in memory accesses, inference time, and energy consumption and makes the accelerator unusable. A correct Mkey can recover the DNN model that is obfuscated by the proposed method. Compared to previous model obfuscation schemes, our proposed method avoids model retraining and does not leak model information.
IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors
- Authors: Sheng Xu, Yanjing Li, Bohan Zeng, Teli ma, Baochang Zhang, Xianbin Cao, Peng Gao, Jinhu Lv
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2210.03477
- Pdf link: https://arxiv.org/pdf/2210.03477
- Abstract Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.
CLAD: A realistic Continual Learning benchmark for Autonomous Driving
- Authors: Eli Verwimp, Kuo Yang, Sarah Parisot, Hong Lanqing, Steven McDonagh, Eduardo Pérez-Pellitero, Matthias De Lange, Tinne Tuytelaars
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2210.03482
- Pdf link: https://arxiv.org/pdf/2210.03482
- Abstract In this paper we describe the design and the ideas motivating a new Continual Learning benchmark for Autonomous Driving (CLAD), that focuses on the problems of object classification and object detection. The benchmark utilises SODA10M, a recently released large-scale dataset that concerns autonomous driving related problems. First, we review and discuss existing continual learning benchmarks, how they are related, and show that most are extreme cases of continual learning. To this end, we survey the benchmarks used in continual learning papers at three highly ranked computer vision conferences. Next, we introduce CLAD-C, an online classification benchmark realised through a chronological data stream that poses both class and domain incremental challenges; and CLAD-D, a domain incremental continual object detection benchmark. We examine the inherent difficulties and challenges posed by the benchmark, through a survey of the techniques and methods used by the top-3 participants in a CLAD-challenge workshop at ICCV 2021. We conclude with possible pathways to improve the current continual learning state of the art, and which directions we deem promising for future research.
AI-Driven Road Maintenance Inspection v2: Reducing Data Dependency & Quantifying Road Damage
- Authors: Haris Iqbal, Hemang Chawla, Arnav Varma, Terence Brouns, Ahmed Badar, Elahe Arani, Bahram Zonooz
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2210.03570
- Pdf link: https://arxiv.org/pdf/2210.03570
- Abstract Road infrastructure maintenance inspection is typically a labor-intensive and critical task to ensure the safety of all road users. Existing state-of-the-art techniques in Artificial Intelligence (AI) for object detection and segmentation help automate a huge chunk of this task given adequate annotated data. However, annotating videos from scratch is cost-prohibitive. For instance, it can take an annotator several days to annotate a 5-minute video recorded at 30 FPS. Hence, we propose an automated labelling pipeline by leveraging techniques like few-shot learning and out-of-distribution detection to generate labels for road damage detection. In addition, our pipeline includes a risk factor assessment for each damage by instance quantification to prioritize locations for repairs which can lead to optimal deployment of road maintenance machinery. We show that the AI models trained with these techniques can not only generalize better to unseen real-world data with reduced requirement for human annotation but also provide an estimate of maintenance urgency, thereby leading to safer roads.
An Investigation into Whitening Loss for Self-supervised Learning
- Authors: Xi Weng, Lei Huang, Lei Zhao, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2210.03586
- Pdf link: https://arxiv.org/pdf/2210.03586
- Abstract A desirable objective in self-supervised learning (SSL) is to avoid feature collapse. Whitening loss guarantees collapse avoidance by minimizing the distance between embeddings of positive pairs under the conditioning that the embeddings from different views are whitened. In this paper, we propose a framework with an informative indicator to analyze whitening loss, which provides a clue to demystify several interesting phenomena as well as a pivoting point connecting to other SSL methods. We reveal that batch whitening (BW) based methods do not impose whitening constraints on the embedding, but they only require the embedding to be full-rank. This full-rank constraint is also sufficient to avoid dimensional collapse. Based on our analysis, we propose channel whitening with random group partition (CW-RGP), which exploits the advantages of BW-based methods in preventing collapse and avoids their disadvantages requiring large batch size. Experimental results on ImageNet classification and COCO object detection reveal that the proposed CW-RGP possesses a promising potential for learning good representations. The code is available at https://github.com/winci-ai/CW-RGP.
Humans need not label more humans: Occlusion Copy & Paste for Occluded Human Instance Segmentation
- Authors: Evan Ling, Dezhao Huang, Minhoe Hur
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2210.03686
- Pdf link: https://arxiv.org/pdf/2210.03686
- Abstract Modern object detection and instance segmentation networks stumble when picking out humans in crowded or highly occluded scenes. Yet, these are often scenarios where we require our detectors to work well. Many works have approached this problem with model-centric improvements. While they have been shown to work to some extent, these supervised methods still need sufficient relevant examples (i.e. occluded humans) during training for the improvements to be maximised. In our work, we propose a simple yet effective data-centric approach, Occlusion Copy & Paste, to introduce occluded examples to models during training - we tailor the general copy & paste augmentation approach to tackle the difficult problem of same-class occlusion. It improves instance segmentation performance on occluded scenarios for "free" just by leveraging on existing large-scale datasets, without additional data or manual labelling needed. In a principled study, we show whether various proposed add-ons to the copy & paste augmentation indeed contribute to better performance. Our Occlusion Copy & Paste augmentation is easily interoperable with any models: by simply applying it to a recent generic instance segmentation model without explicit model architectural design to tackle occlusion, we achieve state-of-the-art instance segmentation performance on the very challenging OCHuman dataset. Source code is available at https://github.com/levan92/occlusion-copy-paste.
Keyword: transformer
Enhancing Mixup-Based Graph Learning for Language Processing via Hybrid Pooling
- Authors: Zeming Dong, Qiang Hu, Yuejun Guo, Maxime Cordy, Mike Papadakis, Yves Le Traon, Jianjun Zhao
- Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2210.03123
- Pdf link: https://arxiv.org/pdf/2210.03123
- Abstract Graph neural networks (GNNs) have recently been popular in natural language and programming language processing, particularly in text and source code classification. Graph pooling which processes node representation into the entire graph representation, which can be used for multiple downstream tasks, e.g., graph classification, is a crucial component of GNNs. Recently, to enhance graph learning, Manifold Mixup, a data augmentation strategy that mixes the graph data vector after the pooling layer, has been introduced. However, since there are a series of graph pooling methods, how they affect the effectiveness of such a Mixup approach is unclear. In this paper, we take the first step to explore the influence of graph pooling methods on the effectiveness of the Mixup-based data augmentation approach. Specifically, 9 types of hybrid pooling methods are considered in the study, e.g., $\mathcal{M}{sum}(\mathcal{P}{att},\mathcal{P}_{max})$. The experimental results on both natural language datasets (Gossipcop, Politifact) and programming language datasets (Java250, Python800) demonstrate that hybrid pooling methods are more suitable for Mixup than the standard max pooling and the state-of-the-art graph multiset transformer (GMT) pooling, in terms of metric accuracy and robustness.
Gastrointestinal Disorder Detection with a Transformer Based Approach
- Authors: A.K.M. Salman Hosain, Mynul islam, Md Humaion Kabir Mehedi, Irteza Enan Kabir, Zarin Tasnim Khan
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2210.03168
- Pdf link: https://arxiv.org/pdf/2210.03168
- Abstract Accurate disease categorization using endoscopic images is a significant problem in Gastroenterology. This paper describes a technique for assisting medical diagnosis procedures and identifying gastrointestinal tract disorders based on the categorization of characteristics taken from endoscopic pictures using a vision transformer and transfer learning model. Vision transformer has shown very promising results on difficult image classification tasks. In this paper, we have suggested a vision transformer based approach to detect gastrointestianl diseases from wireless capsule endoscopy (WCE) curated images of colon with an accuracy of 95.63%. We have compared this transformer based approach with pretrained convolutional neural network (CNN) model DenseNet201 and demonstrated that vision transformer surpassed DenseNet201 in various quantitative performance evaluation metrics.
Interpreting County Level COVID-19 Infection and Feature Sensitivity using Deep Learning Time Series Models
- Authors: Md Khairul Islam, Di Zhu, Yingzheng Liu, Andrej Erkelens, Nick Daniello, Judy Fox
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2210.03258
- Pdf link: https://arxiv.org/pdf/2210.03258
- Abstract Interpretable machine learning plays a key role in healthcare because it is challenging in understanding feature importance in deep learning model predictions. We propose a novel framework that uses deep learning to study feature sensitivity for model predictions. This work combines sensitivity analysis with heterogeneous time-series deep learning model prediction, which corresponds to the interpretations of spatio-temporal features. We forecast county-level COVID-19 infection using the Temporal Fusion Transformer. We then use the sensitivity analysis extending Morris Method to see how sensitive the outputs are with respect to perturbation to our static and dynamic input features. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves high prediction performance compared to a PyTorch baseline. 2) By analyzing the Morris sensitivity indices and attention patterns, we decipher the meaning of feature importance with observational population and dynamic model changes. 3) We have collected 2.5 years of socioeconomic and health features over 3142 US counties, such as observed cases and deaths, and a number of static (age distribution, health disparity, and industry) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we conduct extensive experiments and show our model can learn complex interactions and perform predictions for daily infection at the county level. Being able to model the disease infection with a hybrid prediction and description accuracy measurement with Morris index at the county level is a central idea that sheds light on individual feature interpretation via sensitivity analysis.
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
- Authors: Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2210.03265
- Pdf link: https://arxiv.org/pdf/2210.03265
- Abstract Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard method in machine learning. Recently, parameter-efficient fine-tuning methods show promise in adapting a pretrained model to different tasks while training only a few parameters. Despite their success, most existing methods are proposed in Natural Language Processing tasks with language Transformers, and adaptation to Computer Vision tasks with Vision Transformers remains under-explored, especially for dense vision tasks. Further, in multi-task settings, individually fine-tuning and storing separate models for different tasks is inefficient. In this work, we provide an extensive multi-task parameter-efficient benchmark and examine existing parameter-efficient fine-tuning NLP methods for vision tasks. Our results on four different dense vision tasks showed that existing methods cannot be efficiently integrated due to the hierarchical nature of the Hierarchical Vision Transformers. To overcome this issue, we propose Polyhistor and Polyhistor-Lite, consisting of Decomposed HyperNetworks and Layer-wise Scaling Kernels, to share information across different tasks with a few trainable parameters. This leads to favorable performance improvements against existing parameter-efficient methods while using fewer trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using ~10% of their trainable parameters. Furthermore, our methods show larger performance gains when large networks and more pretraining data are used.
Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning
- Authors: Andrew J. Nam, Mustafa Abdool, Trevor Maxfield, James L. McClelland
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2210.03275
- Pdf link: https://arxiv.org/pdf/2210.03275
- Abstract Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks, and is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules can solve problems independently of the particular values of the variables. Large transformer-based language models have pushed the boundaries on how well neural networks can generalize to novel inputs, but their complexity obfuscates they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in smaller scale transformers. Using a reasoning task based on the puzzle Sudoku, we show that OODG can occur on complex problems if the training set includes examples sampled from the whole distribution of simpler component tasks.
GMA3D: Local-Global Attention Learning to Estimate Occluded Motions of Scene Flow
- Authors: Zhiyang Lu, Ming Cheng
- Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2210.03296
- Pdf link: https://arxiv.org/pdf/2210.03296
- Abstract Scene flow is the collection of each point motion information in the 3D point clouds. It is a vital tool applied to many tasks, such as autonomous driving and augmented reality. However, there are always occlusion points between two consecutive point clouds, whether from the sparsity data sampling or real-world occlusion. In this paper, we focus on addressing occlusion issues in scene flow by self-similarity and local consistency of moving objects. We propose a GMA3D module based on the transformer framework, which utilizes local and global similarity to infer the motion information of occluded points from the motion information of local and global non-occluded points respectively, and then uses an offset generator to aggregate them. Our module is the first to apply the transformer-based architecture to gauge the scene flow occlusion problem on point clouds. Experiments show that our GMA3D can solve the occlusion problem in the scene flow, especially in the real scene. We evaluate the proposed method on the occluded version datasets and get state-of-the-art results on the real scene KITTI. To testify that GMA3D is still beneficial for non-occluded scene flow, we also conducted experiments on non-occluded version datasets and achieved state-of-the-art results on FlyThings3D and KITTI. The code is available at https://github.com/O-VIGIA/GMA3D.
Game-Theoretic Understanding of Misclassification
- Authors: Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2210.03349
- Pdf link: https://arxiv.org/pdf/2210.03349
- Abstract This paper analyzes various types of image misclassification from a game-theoretic view. Particularly, we consider the misclassification of clean, adversarial, and corrupted images and characterize it through the distribution of multi-order interactions. We discover that the distribution of multi-order interactions varies across the types of misclassification. For example, misclassified adversarial images have a higher strength of high-order interactions than correctly classified clean images, which indicates that adversarial perturbations create spurious features that arise from complex cooperation between pixels. By contrast, misclassified corrupted images have a lower strength of low-order interactions than correctly classified clean images, which indicates that corruptions break the local cooperation between pixels. We also provide the first analysis of Vision Transformers using interactions. We found that Vision Transformers show a different tendency in the distribution of interactions from that in CNNs, and this implies that they exploit the features that CNNs do not use for the prediction. Our study demonstrates that the recent game-theoretic analysis of deep learning models can be broadened to analyze various malfunctions of deep learning models including Vision Transformers by using the distribution, order, and sign of interactions.
A Simple Plugin for Transforming Images to Arbitrary Scales
- Authors: Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2210.03417
- Pdf link: https://arxiv.org/pdf/2210.03417
- Abstract Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios. In this paper, we aim to develop a general plugin that can be inserted into existing super-resolution models, conveniently augmenting their ability towards Arbitrary Resolution Image Scaling, thus termed ARIS. We make the following contributions: (i) we propose a transformer-based plugin module, which uses spatial coordinates as query, iteratively attend the low-resolution image feature through cross-attention, and output visual feature for the queried spatial location, resembling an implicit representation for images; (ii) we introduce a novel self-supervised training scheme, that exploits consistency constraints to effectively augment the model's ability for upsampling images towards unseen scales, i.e. ground-truth high-resolution images are not available; (iii) without loss of generality, we inject the proposed ARIS plugin module into several existing models, namely, IPT, SwinIR, and HAT, showing that the resulting models can not only maintain their original performance on fixed scale factor but also extrapolate to unseen scales, substantially outperforming existing any-scale super-resolution models on standard benchmarks, e.g. Urban100, DIV2K, etc.
DABERT: Dual Attention Enhanced BERT for Semantic Matching
- Authors: Sirui Wang, Di Liang, Jian Song, Yuntao Li, Wei Wu
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2210.03454
- Pdf link: https://arxiv.org/pdf/2210.03454
- Abstract Transformer-based pre-trained language models such as BERT have achieved remarkable results in Semantic Sentence Matching. However, existing models still suffer from insufficient ability to capture subtle differences. Minor noise like word addition, deletion, and modification of sentences may cause flipped predictions. To alleviate this problem, we propose a novel Dual Attention Enhanced BERT (DABERT) to enhance the ability of BERT to capture fine-grained differences in sentence pairs. DABERT comprises (1) Dual Attention module, which measures soft word matches by introducing a new dual channel alignment mechanism to model affinity and difference attention. (2) Adaptive Fusion module, this module uses attention to learn the aggregation of difference and affinity features, and generates a vector describing the matching details of sentence pairs. We conduct extensive experiments on well-studied semantic matching and robustness test datasets, and the experimental results show the effectiveness of our proposed method.
Latent Neural ODEs with Sparse Bayesian Multiple Shooting
- Authors: Valerii Iakovlev, Cagatay Yildiz, Markus Heinonen, Harri Lähdesmäki
- Subjects: Machine Learning (cs.LG)
- Arxiv link: https://arxiv.org/abs/2210.03466
- Pdf link: https://arxiv.org/pdf/2210.03466
- Abstract Training dynamic models, such as neural ODEs, on long trajectories is a hard problem that requires using various tricks, such as trajectory splitting, to make model training work in practice. These methods are often heuristics with poor theoretical justifications, and require iterative manual tuning. We propose a principled multiple shooting technique for neural ODEs that splits the trajectories into manageable short segments, which are optimised in parallel, while ensuring probabilistic control on continuity over consecutive segments. We derive variational inference for our shooting-based latent neural ODE models and propose amortized encodings of irregularly sampled trajectories with a transformer-based recognition network with temporal attention and relative positional encoding. We demonstrate efficient and stable training, and state-of-the-art performance on multiple large-scale benchmark datasets.
Time-Space Transformers for Video Panoptic Segmentation
- Authors: Andra Petrovai, Sergiu Nedevschi
- Subjects: Computer Vision and Pattern Recognition (cs.CV)
- Arxiv link: https://arxiv.org/abs/2210.03546
- Pdf link: https://arxiv.org/pdf/2210.03546
- Abstract We propose a novel solution for the task of video panoptic segmentation, that simultaneously predicts pixel-level semantic and instance segmentation and generates clip-level instance tracks. Our network, named VPS-Transformer, with a hybrid architecture based on the state-of-the-art panoptic segmentation network Panoptic-DeepLab, combines a convolutional architecture for single-frame panoptic segmentation and a novel video module based on an instantiation of the pure Transformer block. The Transformer, equipped with attention mechanisms, models spatio-temporal relations between backbone output features of current and past frames for more accurate and consistent panoptic estimates. As the pure Transformer block introduces large computation overhead when processing high resolution images, we propose a few design changes for a more efficient compute. We study how to aggregate information more effectively over the space-time volume and we compare several variants of the Transformer block with different attention schemes. Extensive experiments on the Cityscapes-VPS dataset demonstrate that our best model improves the temporal consistency and video panoptic quality by a margin of 2.2%, with little extra computation.
How Large Language Models are Transforming Machine-Paraphrased Plagiarism
- Authors: Jan Philip Wahle, Terry Ruas, Frederic Kirstein, Bela Gipp
- Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- Arxiv link: https://arxiv.org/abs/2210.03568
- Pdf link: https://arxiv.org/pdf/2210.03568
- Abstract The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work. However, the role of large autoregressive transformers in generating machine-paraphrased plagiarism and their detection is still developing in the literature. This work explores T5 and GPT-3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia. We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples. Our results suggest that large models can rewrite text humans have difficulty identifying as machine-paraphrased (53% mean acc.). Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5). The best-performing detection model (GPT-3) achieves a 66% F1-score in detecting paraphrases.
Understanding Transformer Memorization Recall Through Idioms
- Authors: Adi Haviv, Ido Cohen, Jacob Gidron, Roei Schuster, Yoav Goldberg, Mor Geva
- Subjects: Computation and Language (cs.CL)
- Arxiv link: https://arxiv.org/abs/2210.03588
- Pdf link: https://arxiv.org/pdf/2210.03588
- Abstract To produce accurate predictions, language models (LMs) must balance between generalization and memorization. Yet, little is known about the mechanism by which transformer LMs employ their memorization capacity. When does a model decide to output a memorized phrase, and how is this phrase then retrieved from memory? In this work, we offer the first methodological framework for probing and characterizing recall of memorized sequences in transformer LMs. First, we lay out criteria for detecting model inputs that trigger memory recall, and propose idioms as inputs that fulfill these criteria. Next, we construct a dataset of English idioms and use it to compare model behavior on memorized vs. non-memorized inputs. Specifically, we analyze the internal prediction construction process by interpreting the model's hidden representations as a gradual refinement of the output probability distribution. We find that across different model sizes and architectures, memorized predictions are a two-step process: early layers promote the predicted token to the top of the output distribution, and upper layers increase model confidence. This suggests that memorized information is stored and retrieved in the early layers of the network. Last, we demonstrate the utility of our methodology beyond idioms in memorized factual statements. Overall, our work makes a first step towards understanding memory recall, and provides a methodological basis for future studies of transformer memorization.
Keyword: scene understanding
There is no result
Keyword: visual reasoning
There is no result