Thursday, March 20, 2014

Visualizing Changing Roles over time

So, my main research question is about how people move through various roles in a community over time.

In the end, I will be using social network analysis as a driver for why people move through roles, but I will start with identifying what the roles are, and some summary statistics of how people move through them.

In order to identify roles, I created monthly activity snapshots for each user, and then used a clustering algorithm in R to automatically identify different "behavioral roles". There is some evidence that the data don't cluster cleanly, but clustering isn't a central component of my research, so I am moving forward anyway.

I decided to use the k-mediods (aka "partitioning around mediods" or "pam") algorithm in R. I used the silhouette function to identify the best k (which was 3 clusters).

The data I used to create the clusters only included those months where a user made at least 5 edits. So, I created a 4th cluster to represent months where a user made less than 5 edits, and used a python program to add the cluster results to the original stats file.

I then wrote another python script to rearrange this data, so that it is in the format

ID   Month1    Month2   Month3 ...
1    1         2        2
2    1         0        2
...

so that Month1 is the user's role in their first month, Month2 the role in that user's 2nd month, etc.

For way, way too long today I've been trying to get R to display this data as a stacked area graph of the ratio of roles by month. It's been a huge pain to try to figure out how to reshape it, etc.

Once I can get that, I want to compare that graph to the graph of those who were in each role at least X times during their tenure.

No comments:

Post a Comment