Community Expectations for Research Artifacts and Evaluation Processes (Additional Material)

Survey Distribution

Using LimeSurvey (Version 3.17.7) installed on a university server at Paderborn University we distributed a questionnaire to all AEC members collected in the analysis of Calls for Artifacts.

Related Artifacts:

Survey Questionnaire
Data Format Description: HTML (printable form). The original questionnaire was a live form in LimeSurvey.

Survey Questions (survey-questions.xlsx) - Extracted for references in analysis scripts
Data Format Description: Excel

Column	Description
Number	Number of the question
Code	Internal code to related answers to questions (e.g. g4)
Text	Question text
Type	Answer type (e.g. Yes/No, Text, Matrix)

Data Raw Raw result data file (results-survey54231.xlsx)
Data Format Description: Excel (Exported from LimeSurvey)

Column	Description
id	Participant ID
submitdate	Timestamp of final submission
lastpage	The last page displayed for the participant (0-4). Values below 4 indicate an incomplete answer.
startlanguage	The language the questionnaire was used in (only English (en) was available)
seed	Random seed used by LimeSurvey to check if invitations have been used already
startdate	Timestamp when the participant started answering the survey
datestamp	Last timestamp when the participant was active
[questionCode] (multiple columns)	Anwers for questions
interviewtime	Time in seconds the participant needed to complete the questionnaire
[questionCode]Time (multiple columns)	Timing for multiple questions or question groups

Plots

From the raw results we created the plots for Figure 1 and Figure 2 using R scripts.

Related Artifacts:

R Script Committee sizes and responses (Figure 1) (conferencespread.R)
Format: A script in the R language
It can be run inside of the analysis/survey folder with the R < conferencespread.R --vanilla command. It uses the results-survey54231.xlsx file described above as well as the aec.xlsx file described in the call analysis process. It outputs the plot in PDF format to output/ConferencePlot.pdf.

R Script Histogram of individuals by number of AECs served in (Figure 2) (participant_stats.R)
Format: A script in the R language
It can be run inside of the analysis/survey folder with the R < participant_stats.R --vanilla command. It uses the results-survey54231.xlsx file described above as well as the aec.xlsx file described in the call analysis process. It outputs the plot in PDF format to output/aec_histogram.pdf.

Output Figures (Figures derived from the collected data.)
Format: A webpage showing the created figures for convenience in this artifact.

Open Card Sorting

Methodology

We follow Hudson's approach of open card sorting [Hudson13] to analyze the answers.

We assigned (at least) two authors per survey question to process the answers. One author identified higher-order topics to each answer. As the process was open, there were no predetermined categories, but they were extracted while reading the answers. For instance, for the answer “Reproducibility to a certain exten[t]. Availability of the code.” to the question “[...] what is the purpose of artifact evaluation?” the labels “reproducibility” and “availability” were extracted. The other author checked the labels. Difficult cases were marked and discussed with all authors until consensus was reached. In a second pass, we reviewed all assigned labels and simplified and harmonized labeling (as different authors have used different labels referring to the same concept). A note on replication: Other researchers might derive different labels, if the dataset would be labeled again from a blank state.

[Hudson13] William Hudson. 2013. Card Sorting. In The Encyclopedia of Human-Computer Interaction. The Interaction Design Foundation

Related Artifacts:

Data Derived Card sorting results Data Format Description: Multiple Excel files - all follow the same structure
Files are named in the following schema AEC-Survey-[QuestionCode].xlsx

Column	Description
Response ID	Participant ID (as in the raw results)
[Question Text]	Answer from the participant
(Multiple columns - no header)	Labels given to the answer

Label Analysis

We have developed two scripts for the analysis and presentation of numerical data and the data from the open card sorting process. We used the output of these scripts to create the report in the paper and for its interpretation.

Related Artifacts:

R Script Analysis of questions with numeric answers (numericdata.R)
Format: A script in the R language
It can be run inside of the analysis/survey folder with the R < numericdata.R --vanilla command. It uses the results-survey54231.xlsx file described above as well as the survey-questions.xlsx file described earlier. It outputs a textual summary of the answers into output/numericresults.txt. The file is re-created at each run. It furthermore produces a plot on answers given on artifact usage to output/au-matrices.svg. We did not use this plot in the paper.

Output Results from the analysis of numeric answers (numericresults.txt)
Format: Textual output for human interpretation
The output of numericdata.R.
Example:

            g1: Are you familiar with the ACM Policy on Artifact Review and Badging?
            We received 157 answers. 106 (67.52 %) were positive. 51 (32.48 %) were negative.

R Script Analysis of full-text answers using the tags from open card sorting (taganalysis.R)
Format: A script in the R language
It can be run inside of the analysis/survey folder with the R < taganalysis.R --vanilla command. It uses the open card sorting Excel files described above as well as the survey-questions.xlsx file described earlier. It outputs a textual summary of the answers into output/tagresults.txt. The file is re-created at each run. It uses the helper script respondent-profiling.R to differentiate answers between communities.

Output Results from the analysis of open card sorting tags (tagresults.txt)
Format: Textual output for human interpretation
The output of taganalysis.R.
Example:

            ---- G4   ----
            g4: In your words, what is the purpose of artifact evaluation?
            For question code g4 we received 147 answers.
            Top 1000 tags were:
            # A tibble: 158 x 2
                tag                                                                    usage
                                                                                  
              1 "Foster reproducibility"                                                  32
              2 "Foster reusability"                                                      26
              3 "Verify results"                                                          19
              4 "Verify claims"                                                           18
              5 "Availability"                                                            16
              6 "Artifact Quality"                                                        13
              7 "Check claims"                                                             6
              ...

A Note of Reusability

Even though the scripts currently output data in form of a text file, they may be easily modified to output the data in other formats as the internal table structure of (tag, usage) is already present.