Skip to content
November 7, 2013 / milllss

Relationships between Repos – Pull Requests

In a previous post we looked at different approaches to Pull Requests – and concluded by noting that repository pairings often involved multiple pull requests between a base/head repo pair. This post investigate these relationships further, and in particular considers the involvement of individual users. Pull requests allow any individual to compose and propose an amendment to any repository, where they are useful they represent a benefit to the receiving (base) repository. Through making useful pull requests an individual with no prior affiliation to a project could demonstrate their ability and usefulness to the project – and in doing so could potentially earn a place within the project. Is there any evidence that pull requests could serve as a ‘recruitment’ mechanism for projects in this manner – the first point of contact between a prospective employee and employer?

The data

The starting point for this post is the data-set concerning 200k base/head repo pairings which had at least 4 pull requests between them – intra-repo pull requests are excluded. In this set there are 86,231 distinct base repos (indicating that base repos often had a ‘relationship’ with multiple heads) and 194,235 distinct head repos (indicating that head repos occasionally had a `relationship’ with multiple base repos).

For each base/head pair a full list of users who made pushes on the respective repositories was extracted from BigQuery, along with variables relating to the timing and frequency of their pushes. The ids and associated variables for users who made the pull requests connecting these repos were also extracted. In total there are 196k records representing 100k users who made pull requests (some users were involved in multiple base/head repo relationships).

Are there users who pull request their way into a project?

In the data-set there are 25,678 cases where a user had recorded pushes to both the base and head repos (in addition to submitting at least one pull request between these) – but that doesn’t necessarily mean they earned contributor rights through pull requests. What would it look like in the data if a user had done so? The prototypical scenario might go as follows: the user (probably forks first then) makes pushes to what will become the head repo in the base/head pair, they then submit a pull request to the base repo (and given how this data-set was produced they likely made many such pull requests) and some time after this there should be a ‘MemberEvent’ when they are officially made a contributor on the base repo; following this they may record pushes to the base repo in the pair directly. There are 3,416 cases which match this profile in the data.

However, the limiting factor here is the requirement of a MemberEvent observed after the user had made pushes and pull requests. These events are sparse in the data – only 5,538 users have a recorded ‘add memberevent’, whereas 25,678 users have at least one push to head and base repo of a pair – and therefore must have had contributor rights to the base repo (either events are missing from the data or they happened before the timeline data begins).

If the criteria are relaxed such that we only require the user to have made first a push on the head, then a pull request, then a push on the base – there are 14,348 users who meet it. This sequence of events is consistent with a user who pull requested their way into a project, but there are of course many other non-github forms of interaction which could have shaped these events.

These users made a median of 18 pushes to the head repo and 8 to the base repo, but the variance on these measures is very large. Once they had the ability to make pushes on the base repo they no longer needed to make pull requests, technically at least. 5,244 stopped making pull requests once they had made their first push to the base repo, for the remainder there is an overlap between the time when they were making pull requests and the time when they began making pushes on the base repo directly.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: