Data Currently in CROP
At the present moment, CROP contains data from two open source communities: Eclipse
and Couchbase. For each of these communities, we provide the data for the 4 most popular
projects in terms of the number of code reviews performed.
The following table reports statistics concerning the data collected for each of these 8 systems, where the Eclipse projects are presented in the upper
section of the table and the Couchbase projects in the lower section.
||Oct-09 to Nov-17
||Sep-09 to Nov-17
||Jun-12 to Nov-17
||Feb-13 to Nov-17
||Apr-10 to Nov-17
||Oct-10 to Apr-16
||Feb-11 to Nov-17
||Mar-14 to Nov-17
||Jan-12 to Nov-17
||Apr-14 to Nov-17
||May-10 to Jul-17
The CROP dataset is organised in three main directories: Metadata, Git Repos and Discussion. Details for each directory
For each software system in CROP, we provide a CSV file that contains the general metadata for each review and revision stored in CROP. The CSV will be
primarily used to navigate through the data and access the code review discussion files and the versions of the code base regarding each revision in the
git repository. Each row in the CSV corresponds to a single revision submitted for code review, where you will find the following information:
id: an unique id to identify the revision within an specific community
review_number: the unique review number in which the revision is part of
revision_number: the number of the revision in the specific review
author: the author of the revision
status: the status of the revision
change_id: the change id of this revision
before_commit_id: the commit id that represents the version of the system before the revision took place
after_commit_id: the commit id that represents the version of the system after the revision took place
CROP provides git repositories that recreate the projects’ reviewing history to include all the revisions submitted for code review. Each repository
has a single master branch, where the before and after versions of the source code for each revision were committed sequentially, based on the review and
revision numbers. Such versions are accessible through the commit ids provided in the projects’ CSV file, as discussed above.
This is the directory in which CROP stores the discussion files for each revision. The directory follows a tree structure, organised by review number,
in which the discussion files for each revision are contained in the directory of its respective review.
A discussion file presents the reviewing data in the following order: first, the description of the revision is presented, which denotes the commit message
of the revision. Such a message includes the revision’s change-id and author. The comments that were made during review by other developers are presented next.
In the discussion file, CROP includes the author of the comment and the respective message.
CROP was first published and described in this research paper.
Since its first publication, CROP has evolved and changed in its content and structure. Although the paper is the official guideline for the CROP dataset, this
website will always describe the most up to date version of the dataset.