Editing Evaluation

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

1The Hong Kong University of Science and Technology (Guangzhou), 2Hong Kong Baptist University, 3Tsinghua University
* Equal contribution † Corresponding author
NeurIPS 2024

Abstract

Model editing has become an increasingly popular alternative for efficiently updating knowledge within language models. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these criteria. Some recent works disclose the pitfalls of these editing methods such as knowledge distortion or conflict. However, the general abilities of post-edited language models remain unexplored. In this paper, we perform a comprehensive evaluation on various editing methods and different language models, and have the following findings: (1) Existing editing methods lead to inevitable performance deterioration on general benchmarks, indicating that existing editing methods maintain the general abilities of the model within only a few dozen edits. When the number of edits is slightly large, the intrinsic knowledge structure of the model is disrupted or even completely damaged. (2) Instruction-tuned models are more robust to editing, showing less performance drop on general knowledge after editing. (3) Language model with large scale is more resistant to editing compared to small model. (4) The safety of the edited model is significantly weakened, even for those safety-aligned models. Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within language models, which motivates further research on more practical and reliable editing methods.

Overview

overview

Overview of our evaluation of edited language models. We illustrate how model editing updates factual knowledge in LLMs (left), and show that when sequential edits are scaled to the thousands, the general abilities and even the intrinsic knowledge structure of the model can be severely degraded, leading to a muting effect where the model fails to produce meaningful outputs (right).

Experimental Results

Overall Performance Trends

results

Impact of the number of edits (RQ1). We evaluate edited Llama2-7B base models on MMLU, BBH, GSM8K and CSQA as we increase the number of sequential edits for different editing methods. Most methods preserve performance for fewer than about 20 edits, but many degrade sharply when edits grow into the hundreds, whereas methods such as PMET and MEND remain comparatively stable.

results

Effect of instruction tuning (RQ2). We compare Llama2-7B base and Llama2-chat-7B after applying various editing methods. Instruction-tuned models exhibit a slower rate of performance decline under sequential editing, indicating that instruction tuning improves robustness of general abilities to editing.

results

Effect of model scale (RQ3). Using the Pythia family (160M–12B parameters), we analyze how editing with methods such as ROME and MEMIT affects models of different sizes. Larger models show less degradation on general benchmarks after editing, especially for ROME, while MEMIT is relatively insensitive to scale.

Capability & Safety Analysis

results

Different capability dimensions (RQ4). We evaluate edited models on world knowledge and reasoning (MMLU, BBH, CSQA), mathematics (GSM8K), and reading comprehension. Across these tasks, different editing methods tend to affect all capabilities to a roughly similar extent, with PMET and MEND best preserving performance even after many edits.

results

Safety of edited models (RQ5). We assess safety on TruthfulQA and ToxiGen and find that even dozens of edits can noticeably compromise safety, increasing the chance of untruthful or toxic generations, even for safety-aligned models. Under extremely large numbers of edits, the disruption of the model’s knowledge structure can lead to apparent but misleading improvements in some safety metrics due to the muting effect.

Citation


@inproceedings{editing-evaluation,
  title={Should We Really Edit Language Models? On the Evaluation of Edited Language Models},
  author={Qi Li and Xiang Liu and Zhenheng Tang and Peijie Dong and Zeyu Li and Xinglin Pan and Xiaowen Chu},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=m0DS4OOmSY}
}