Friday, September 5, 2014

Importing Edgelists into RSiena

So, importing data into RSiena is a bit of a pain. The GUI has some support for importing Pajek files, for example, but I've been working mostly from the command line, and with .R files, which are what the manual covers.

For my current project, I have CSV files in a very common edgelist format, something like -

sourceID,receiverID,weight,wave 

I think it should be simple to import these into RSiena, but it isn't.

RSiena accepts either adjacency matrices - which are matrices with a 0 or 1 in each spot, for each node - or sparse matrices. These are similar to edgelists, but they have to be in the dgTMatrix class. As you can tell by reading the documentation, it's not exactly obvious how to get the data into that format.

I started by trying the Matrix() function, then I found the sparseMatrix() function. I realized that weight didn't matter, so I simply ignored the weight column. This creates a sparse matrix of the type "ngCMatrix", which is a "pattern matrix", and can't be coerced to a dgTMatrix.

So, eventually, I ended up creating a new weight column, with everything set to 1, and reset to 1 if there are duplicate entries in the data.

My current code is below:

 edgeListToAdj <- function(x, waveID){   
     # Remove entries who are not connect to anyone (NomineeID == 0), and not the   
     # current wave   
     tempNet <- x[x$NomineeID > 0 & x$NomineeID <= nodeCount & x$Wave == waveID,]   
     # Create a binary column for weights (since RSiena doesn't use weights).   
     tempNet$Weight <- 1   
     # Convert network obejct to adjacency matrix   
     adjacencyMat <- sparseMatrix(tempNet$NomineeID, tempNet$RespondentID, x=tempNet$Weight,   dims=c(nodeCount,nodeCount))   
     # If any items appear more than once, re-binarize them.   
     # Yes, binarize is a real word.   
     adjacencyMat[adjacencyMat > 1] <- 1   
     # Convert to a dgTMatrix, since this is what RSiena expects   
     return(as(adjacencyMat, "dgTMatrix"))   
 }   
 createNetwork <- function(fileName, numWaves) {  
     print(fileName)  
     # Convert CSV file to data frame  
     netDF <- as.data.frame(read.csv(fileName))  
     # Create an array of adjacency networks  
     net <- lapply(1:numWaves, function(x) edgeListToAdj(netDF, x))  
     # Change this into an RSiena network  
     RSienaObj <- sienaDependent(net)  
     return(RSienaObj)  
 }