bert-probe
bert-probe copied to clipboard
BERT Probe: A python package for probing attention based robustness to character and word based adversarial evaluation. Also, with recipes of implicit and explicit defenses against character-level att...
BERT Probe: A python package for probing attention based robustness evaluation of BERT models
Evaluates BERT models on character and word based adversarial attacks. Also, presents recipes of implicit and explicit defenses against character-level attacks.
Attacks Schematic

Explicit Defense Schematic

Usage
# install dependencies
pip install -r requirements.txt
import stanza
# download stanza german model
stanza.download('de')
epochs = 10
num_labels = 2
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
data_path = {
"train": "./datasets/hasoc_dataset/hasoc_german_train.csv",
"dev": "./datasets/hasoc_dataset/hasoc_german_validation.csv",
"test": "./datasets/hasoc_dataset/hasoc_german_test.csv",
}
model_name = "deepset/gbert-base"
data_loaders = GermanDataLoader(
data_path, model_name, do_cleansing=False, max_sequence_length=128, batch_size=8
)
model = BERTClassifier(num_labels=num_labels).get_model()
optim_config = BertOptimConfig(
model=model, train_dataloader=data_loaders.train_dataloader, epochs=epochs
)
## execute the training routine
model = train_model(
model=model,
optimizer=optim_config.optimizer,
scheduler=optim_config.scheduler,
train_dataloader=data_loaders.train_dataloader,
validation_dataloader=data_loaders.validation_dataloader,
epochs=epochs,
device=device,
model_name=model_name,
)
## test model performance on unseen test set
eval_model(model=model, test_dataloader=data_loaders.test_dataloader, device=device)
logs_path = "./attack_logs"
attack_config = [
(
"hasoc",
"shahrukhx01/gbert-hasoc-german-2019",
"data/hasoc_german_2019/hasoc_german_probing_set.csv",
),
(
"germeval",
"shahrukhx01/gbert-germeval-2021",
"data/hasoc_german_2019/germeval_probing_set.csv",
),
]
for attack_name, model_name_path, dataset_path in attack_config:
## load dataset
dataset = GermanDataset(
filepath=dataset_path
).load_dataset() ### sampling = False
## load model
model_wrapper = GermanHateSpeechModelWrapper(model_name_path=model_name_path)
## define and build attacks
blackbox_wordlevel_attack = BlackboxWordLevelAttack.build(model_wrapper)
blackbox_charlevel_attack = BlackboxCharacterLevel.build(model_wrapper)
attacks = [
(
f"{attack_name}_blackbox_wordlevel_attack",
blackbox_wordlevel_attack,
),
(
f"{attack_name}_blackbox_charlevel_attack",
blackbox_charlevel_attack,
),
]
## execute the attack
ExecuteAttack.execute(dataset, attacks=attacks, logs_path=logs_path)
- Defenses: Explicit Character-level and Abstain label training
Datasets: - Germeval 2021 Task 1: Toxic Comment Classification
- HASOC (2019) German Language: Sub Task 1, Hate Speech Classification
Citing & Authors
If you find this repository helpful, feel free to cite our publication:
@inproceedings{bertprobe,
author = {Shahrukh Khan and
Mahnoor Shahid and
Navdeeppal Singh},
title = {White-Box Attacks on Hate-speech BERT Classifiers in German with Explicit and Implicit Character Level Defense},
booktitle = {BOHR International Journal of Intelligent Instrumentation and Computing, 2022},
publisher = {BOHR Publishers},
year = {2022},
url = {https://bohrpub.com/journals/BIJIIAC/Vol1N1/BIJIIAC_20221104.html}
}