Unsupervised text augmentation using Pre-trained Paraphrase Generation
Published: (Preprint), 2023
Citation: (Preprint) Abhishek Divekar, Mudit Agarwal, Srujana Merugu, and Nikhil Rasiwasia. "Unsupervised text augmentation using Pre-trained Paraphrase Generation".
Abstract:
Unsupervised text augmentation has gained attention in recent years, as approaches which use pre-trained models to produce high-quality augmentations (such as Backtranslation [48]) are replacing simple rule-based noising to attain SOTA performance in fully- and semi-supervised settings [53]. Such approaches have benefits over other model-based text augmentations, as they are applicable to most natural language tasks and their efficacy does not rely on the availability or quality of ground-truths. However, it is difficult to ensure augmented covariates are neither homogeneous nor invalid. To address this, we introduce GROK Score, an unsupervised metric which measures the paraphrase quality of generated text: fluency, semantic fidelity and diversity. When used to re-rank and filter outputs from Beam Search decoding on pre-trained generative models, GROK captures a small subset of diverse generations, which are used as augmentations. We evaluate this strategy in realistic scenarios: “challenging” Amazon Product classification problems from the CPP MultiModal corpus (2.75-3.87 ROC-AUC below the average) with limited text data. Our results indicate that GROK requires 69.5% fewer augmented samples to match the performance of Backtranslation and rule-based Easy Data Augmentation (EDA) [51] across six classification algorithms. Additionally, tuning the GROK-filtering threshold using K-Fold cross-validation leads to an average lift of 0.28 ROC-AUC, improving performance over both EDA and Backtranslation for all algorithms, while requiring 30% fewer augmented samples (achieving 0.53 ROC-AUC lift for VowpalWabbit and 1.10 for WideAndDeep). Human evaluations comparing GROK to prominent metrics like BLEU, METEOR and ROUGE validate our hypothesis that GROK promotes text generations having high diversity, indicating its utility beyond augmentation.