Survey Distribution
Using LimeSurvey (Version 3.17.7) installed on a university server at Paderborn University we distributed a questionnaire to all AEC members collected in the analysis of Calls for Artifacts.
Related Artifacts:
Survey Questionnaire
Data Format Description: HTML (printable form). The original questionnaire was a live form in LimeSurvey.
Survey Questions (survey-questions.xlsx) - Extracted for references in analysis scripts
Data Format Description: Excel
Column | Description |
---|---|
Number | Number of the question |
Code | Internal code to related answers to questions (e.g. g4) |
Text | Question text |
Type | Answer type (e.g. Yes/No, Text, Matrix) |
Data Raw Raw result data file (results-survey54231.xlsx)
Data Format Description: Excel (Exported from LimeSurvey)
Column | Description |
---|---|
id | Participant ID |
submitdate | Timestamp of final submission |
lastpage | The last page displayed for the participant (0-4). Values below 4 indicate an incomplete answer. |
startlanguage | The language the questionnaire was used in (only English (en) was available) |
seed | Random seed used by LimeSurvey to check if invitations have been used already |
startdate | Timestamp when the participant started answering the survey |
datestamp | Last timestamp when the participant was active |
[questionCode] (multiple columns) | Anwers for questions |
interviewtime | Time in seconds the participant needed to complete the questionnaire |
[questionCode]Time (multiple columns) | Timing for multiple questions or question groups |
Plots
From the raw results we created the plots for Figure 1 and Figure 2 using R scripts.
Related Artifacts:
R Script Committee sizes and responses (Figure 1) (conferencespread.R)
Format: A script in the R language
It can be run inside of the analysis/survey
folder with the R < conferencespread.R --vanilla
command. It uses the results-survey54231.xlsx
file described above as well as the aec.xlsx
file described in the call analysis process. It outputs the plot in PDF format to output/ConferencePlot.pdf
.
R Script Histogram of individuals by number of AECs served in (Figure 2) (participant_stats.R)
Format: A script in the R language
It can be run inside of the analysis/survey
folder with the R < participant_stats.R --vanilla
command. It uses the results-survey54231.xlsx
file described above as well as the aec.xlsx
file described in the call analysis process. It outputs the plot in PDF format to output/aec_histogram.pdf
.
Output Figures (Figures derived from the collected data.)
Format: A webpage showing the created figures for convenience in this artifact.
Open Card Sorting
Methodology
We follow Hudson's approach of open card sorting [Hudson13] to analyze the answers.
We assigned (at least) two authors per survey question to process the answers. One author identified higher-order topics to each answer. As the process was open, there were no predetermined categories, but they were extracted while reading the answers. For instance, for the answer “Reproducibility to a certain exten[t]. Availability of the code.” to the question “[...] what is the purpose of artifact evaluation?” the labels “reproducibility” and “availability” were extracted. The other author checked the labels. Difficult cases were marked and discussed with all authors until consensus was reached. In a second pass, we reviewed all assigned labels and simplified and harmonized labeling (as different authors have used different labels referring to the same concept). A note on replication: Other researchers might derive different labels, if the dataset would be labeled again from a blank state.
Related Artifacts:
Data Derived Card sorting results
Data Format Description: Multiple Excel files - all follow the same structure
Files are named in the following schema AEC-Survey-[QuestionCode].xlsx
Column | Description |
---|---|
Response ID | Participant ID (as in the raw results) |
[Question Text] | Answer from the participant |
(Multiple columns - no header) | Labels given to the answer |
Label Analysis
We have developed two scripts for the analysis and presentation of numerical data and the data from the open card sorting process. We used the output of these scripts to create the report in the paper and for its interpretation.
Related Artifacts:
R Script Analysis of questions with numeric answers (numericdata.R)
Format: A script in the R language
It can be run inside of the analysis/survey
folder with the R < numericdata.R --vanilla
command. It uses the results-survey54231.xlsx
file described above as well as the survey-questions.xlsx
file described earlier. It outputs a textual summary of the answers into output/numericresults.txt
. The file is re-created at each run. It furthermore produces a plot on answers given on artifact usage to output/au-matrices.svg
. We did not use this plot in the paper.
Output Results from the analysis of numeric answers (numericresults.txt)
Format: Textual output for human interpretation
The output of numericdata.R
.
Example:
g1: Are you familiar with the ACM Policy on Artifact Review and Badging? We received 157 answers. 106 (67.52 %) were positive. 51 (32.48 %) were negative.
R Script Analysis of full-text answers using the tags from open card sorting (taganalysis.R)
Format: A script in the R language
It can be run inside of the analysis/survey
folder with the R < taganalysis.R --vanilla
command. It uses the open card sorting Excel files described above as well as the survey-questions.xlsx
file described earlier. It outputs a textual summary of the answers into output/tagresults.txt
. The file is re-created at each run. It uses the helper script respondent-profiling.R
to differentiate answers between communities.
Output Results from the analysis of open card sorting tags (tagresults.txt)
Format: Textual output for human interpretation
The output of taganalysis.R
.
Example:
---- G4 ---- g4: In your words, what is the purpose of artifact evaluation? For question code g4 we received 147 answers. Top 1000 tags were: # A tibble: 158 x 2 tag usage1 "Foster reproducibility" 32 2 "Foster reusability" 26 3 "Verify results" 19 4 "Verify claims" 18 5 "Availability" 16 6 "Artifact Quality" 13 7 "Check claims" 6 ...
A Note of Reusability
Even though the scripts currently output data in form of a text file, they may be easily modified to output the data in other formats as the internal table structure of (tag, usage) is already present.