Saturday, March 29, 2014

Roles Visualization

So, I thanks to some help from the very kind BrodieG on Stack Overflow, I was finally able to get some visualizations of the way that roles change over time. I am using the following code (as you can see, I tried to learn how to do melting and reshaping, then kind of gave up that hope - maybe another time).


library(reshape2)
library(ggplot2)


#clusters.mlt <- melt(clusters, id.vars="id")
#clusters.agg <- aggregate(. ~ id + variable, clusters.mlt, sum)

# The minimum number of times a user has to be in a given group in order to
# be shown in the graph for that group
minMonths = 2

makeGraph <- function(clusters){
        clus1 <- apply(clusters, 2, function(x) {sum(x=='1', na.rm=TRUE)})
        clus2 <- apply(clusters, 2, function(x) {sum(x=='2', na.rm=TRUE)})
        clus3 <- apply(clusters, 2, function(x) {sum(x=='3', na.rm=TRUE)})
        clus0 <- apply(clusters, 2, function(x) {sum(x=='0', na.rm=TRUE)})
        clusters2 <- data.frame(clus0, clus1, clus2, clus3)
        c2 <- t(clusters2)
        c3 <- as.data.frame(c2)
        c3$id = c('Low Activity Cluster', 'Cluster 1', 'Cluster 2', 'Cluster 3')
        c3 <- c3[order(c3$'id'),]
        return(ggplot(melt(c3, id.vars="id")) +
          geom_area(aes(x=variable, y=value, fill=id, group=id), position="fill"))
}
#print(ggplot(clusters.mlt) +
 # stat_summary(aes(x=variable, y=value, fill=id, group=id), fun.y=sum, position="fill",        geom="area"))

# Stats for just those who were in each group

clusterDF <- as.data.frame(read.csv('clustersByID.csv'))
ggsave(file="../Results/allUsers.png", plot=makeGraph(clusterDF))
cl1 <- clusterDF[apply(clusterDF, 1, function(x) {sum(x[2:76] == "1", na.rm=TRUE) >=            minMonths}),]
ggsave("../Results/Role1_2+.png", makeGraph(cl1))

And this is what the code produces (some new colors would probably be a good thing to work on next!) There are some interesting things going on here, but no clear movement into the central-type role (Role 1).

Pasting text in Vim

So, I use Vim to do all of my programming, as well as for writing my thesis. Despite how often I use Vim, I'm not really a very expert user (as you will soon learn).

So, it's not uncommon that I will want to copy and paste a code snippet from a website (usually StackOverflow), either to use it in my own code, or to figure out how it works.

When you paste it into vim, it really looks terrible. Everything is indented like crazy, and it's almost unusable. When it's a small snippet, it doesn't take long to fix it, but for longer pieces of code, it's a huge pain.

I always thought that it was just some poor programming on StackOverflow - that they put in a bunch of hidden tabs or something. But, I came across this article today, and realized that the problem is actually how Vim handles the pasted text.

Basically, before you want to paste a hunk of indented text from somewhere else, run

:set paste

Then, when you are done pasting to your heart's content, run

:set nopaste

and life will be good.

Thursday, March 20, 2014

Visualizing Changing Roles over time

So, my main research question is about how people move through various roles in a community over time.

In the end, I will be using social network analysis as a driver for why people move through roles, but I will start with identifying what the roles are, and some summary statistics of how people move through them.

In order to identify roles, I created monthly activity snapshots for each user, and then used a clustering algorithm in R to automatically identify different "behavioral roles". There is some evidence that the data don't cluster cleanly, but clustering isn't a central component of my research, so I am moving forward anyway.

I decided to use the k-mediods (aka "partitioning around mediods" or "pam") algorithm in R. I used the silhouette function to identify the best k (which was 3 clusters).

The data I used to create the clusters only included those months where a user made at least 5 edits. So, I created a 4th cluster to represent months where a user made less than 5 edits, and used a python program to add the cluster results to the original stats file.

I then wrote another python script to rearrange this data, so that it is in the format

ID   Month1    Month2   Month3 ...
1    1         2        2
2    1         0        2
...

so that Month1 is the user's role in their first month, Month2 the role in that user's 2nd month, etc.

For way, way too long today I've been trying to get R to display this data as a stacked area graph of the ratio of roles by month. It's been a huge pain to try to figure out how to reshape it, etc.

Once I can get that, I want to compare that graph to the graph of those who were in each role at least X times during their tenure.

Friday, March 14, 2014

Sunbelt

I had a very good time at Sunbelt.

I finally got a model that would converge in RSiena. My plan was to start with a simple model, and then to add more of my hypotheses once I got the simple model working. It turned out to be tough enough to get the simple model to converge that I just went with that once I had it working.

At Sunbelt, I went to a great workshop by Tom Snijders, which was very helpful. He also came to the poster session, and gave me some great tips about how to work with my model.