Community Expectations for Research Artifacts and Evaluation Processes (Additional Material)

Call Collection

Using a web browser (Google Chrome) we collected the calls for artifact from conferences in the PL and SE field. In cases where the original website was no longer available, we used the Internet Archive's Wayback Machine to retrieve a stored version of the website.

Related Artifact:

Data Raw Collected Calls for Artifacts (Collected Calls for Artifacts (CfA) of the inspected conferences.)
Data Format Description: Complete web pages (HTML plus graphics, scripts, and styles) and pure text of calls for artifacts.

Committee Extraction

For each call for artifact we manually extracted the artifact commitee including the respective chairs and manually searched for their current e-mail address. Sources for these addresses were affiliation websites or current publications in digital libraries. These e-mail addresses were used to invite the participants of the survey. In the related data artifact the e-mail addresses have been omitted for concerns of privacy.

Related artifact:

Data Derived Artifact Evaluation Commitees (collected from CfAs) (aec.xlsx) - e-mail addresses removed
Data Format Description: A file in Microsoft Excel format.

Column	Description
First name	First name of the committee member
Last name	Last name of the committee member
Full name	Full name of the commitee member
Conference	Venue of the artifact evaluation track (e.g. CAV 2015)
E-Mail	anonymized

Analysis of Calls

We analyzed the text of artifact calls for explicit statements of two types: (1) Statements about the purpose of artifact evaluations as a process and (2) statements about criteria that artifacts under evaluation are expected to meet. The analysis was performed manually by one researcher and confirmed by another one independently. A tool for plagiarism checking (Sherlock) and a tool for difference visualization (git diff –no-index –color-words) were used to aid the analysis in order to recognize repeating passages. We expect the stated criteria to follow from the stated purpose, however, analyzing both kinds of statements, we are able to identify possible inconsistencies. The inconsistencies would indicate possible misunderstandings of the used terms, be it on our side or on the side of the calls’ authors.

We recorded the results of this analysis in a CSV file along with the mention of badges, submission criteria, and explicit mentions of artifact types.

We used an R script inside of RStudio to help us summarize the findings. The script does not produce an explicit output but sets several variables which we used in discussing the findings and summarizing it for the paper.

Related Artifacts:

Data Derived Text analysis results from calls of artifacts (callAnalysisResults.csv)
Data Format Description: A CSV file (semicolon delimited)

Column	Description
conference	Inspected conference and year (e.g. ICSE2019)
purpose	Stated purpose of artifact evaluation (e.g. reuse)
badges	Comma-separated list of mentioned badges (e.g. functional, reusable)
evaluation_criteria	Mentioned evaluation criteria used (e.g. ACM)
submission_criteria	Comma-separated list of mentioned submission criteria (e.g. <= 30 min configuration & installation)
artifact_types	Comma-separated list of mentioned artifact types (e.g. software,data,frameworks,others)
per_type_criteria	Mentioned criteria per artifact type (e.g. submission criteria for tools/data)

R Script Analysis script for CfA tags
Format: A script in the R language
It can be run inside of the analysis/calls folder with the R < analysis.R --vanilla command. It uses the callAnalysisResults.csv file described above. It outputs multiple data files in CSV format for the different inspection dimensions. The format of these files is as follows:
Data Format Description: A CSV file (semicolon delimited)

Column	Description
""	Counter
(submission criteria, evaluation criteria, artifact purpose, etc. ) - differs in files	Specific label inspected
count	Frequency this label was observed
byNumberTags	Frequency this label was observed related to the total number of tags (unused in the paper)
byAllCalls	Frequency this label was observed related to the total number of calls (unused in the paper)
byCallsWithTags	Frequency this label was observed related to the total number of tags (unused in the paper)