Automatic Deduction of Input Transformation Function for Metamorphic Testing

Welcome to MR-Adopt’s Site！

The paper “MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing” is submitted to ASE’24. This page offers access to MR-Adopt and experimental data. Additionally, it provides supplementary materials that are omitted from the paper due to space limitations.

Table of Contents

MR-Adopt
Experimental Data
- Dataset
- RQ1: Effectiveness of MR-Adopt
- RQ2: Effectiveness of Input Transformations
- RQ3: Ablation Study on MR-Adopt
- RQ4: Usefulness of Input Transformations
Supplementary Materials
- Prompt Templates and Examples

MR-Adopt

While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enhance test adequacy.

In this paper, we propose MR-Adopt (Automatic Deduction Of inPut Transformation) to automatically deduce the input transformation from the hard-coded source and follow-up inputs, aiming to enable the encoded MRs to be reused with new source inputs. With typically only one pair of source and follow-up inputs available in an MR-encoded test case as the example, we leveraged LLMs to understand the intention of the test case and generate additional examples of source-followup input pairs. This helps to guide the generation of input transformations generalizable to multiple source inputs. Besides, to mitigate the issue that LLMs generate erroneous code, we refine LLM-generated transformations by removing MR-irrelevant code elements with data-flow analysis. Finally, we assess candidate transformations based on encoded output relations and select the best transformation as the result. Evaluation results show that MR-Adopt can generate input transformations applicable to all experimental source inputs for 72.00% of encoded MRs, which is 33.33% more than using vanilla GPT-3.5. By incorporating MR-Adopt-generated input transformations, encoded MR-based test cases can effectively enhance the test adequacy, increasing the line coverage and mutation score by 10.62% and 18.91%, respectively.

The source code of MR-Adopt can be found here.

Experimental Data

Dataset and Ground Truth. To evaluate our proposed approach and generated input transformations, we prepared a datset of 100 tasks and corresponding ground truths. It can be found here.

RQ1: Effectiveness of MR-Adopt. To evaluate the soundness of MR-Adopt in discovering MTCs, we manually examined 164 samples, and found 97% of them are true positives. This indicates the high precision of MR-Adopt in discovering MTCs and the high quality of our MTC dataset

The results show that MR-Adopt significantly outperforms the baseline LLMs across all metrics. Compared to directly prompting LLMs, MR-Adopt achieves 17.3%∼33.33% improvement in generating 100% generalizable input transformations.

The detailed experimental data can be found here.

RQ2: Effectiveness of Input Transformations. This RQ examined the quality of followup inputs produced by input transformations generated by MR-Adopt. We set LLMs as the baselines because they are off-the-shelf black-box transformations that can generate follow-up inputs given source inputs,

The results show that MR-Adopt’s refinement step can effectively enhance follow-up input generation, with up to 18.59% improvement for GPT-3.5. Additionally, MR-Adopt-generated transformations can effectively generate follow-up inputs for 91.21% source inputs, surpassing GPT-3.5+ by 75.99%.

The detailed experimental data can be found here.

RQ3: Ablation Study on MR-Adopt. To demostrate the contribution of each component in MR-Adopt, we created three variants (i.e., v1-MR-Adopt w/o additional input pairs, v2-MR-Adopt w/o refinement step, and v3-MR-Adopt w/o assessment step).

The results show that all three designs contribute to the effectiveness of MR-Adopt in generating generalizable transformations. The assessment procedure contributes the most, and additional example input pairs contribute similarly. The detailed experimental data can be found here.

RQ4: Usefulness of Input Transformations. To evaluate the pratical usefulness of generated input transformations, we integrated the generated input transformations into MTCs to construct generalized MRs and measured how well such MRs enhanced test adequacy. This revealed the practical usefulness of MR-Adopt’s transformations in enhancing test adequacy.

The results show that test cases constructed from generalized MRs could achieve 13.52% and 9.42% increases in the line coverage and mutation score, respectively. The detailed results can be found here.

Supplementary Materials

Prompt Templates and Examples.

MR-Adopt leveraged LLMs to generate addtional input pairs and input transformations. The prompt template and concrete examples can be found here.