Friday, January 31, 2014

Rethinking Some Measures

So, I've been reading more about how RSiena (and stochastic actor-oriented models) work, and I've been rethinking a few things.

The Problem

Basically, they are expecting both networks and behavior to be fairly similar between observations. For example, this paper says that
A dynamic network consists of ties between actors that change
over time. A foundational assumption of the models discussed in
this paper is that the network ties are not brief events, but can be
regarded as states with a tendency to endure over time. Many rela-
tions commonly studied in network analysis naturally satisfy this
requirement of gradual change, such as friendship, trust, and coop-
eration. Other networks more strongly resemble ‘event data’, e.g.,
the set of all telephone calls among a group of actors at any given
time point, or the set of all e-mails being sent at any given time
point. While it is meaningful to interpret these networks as indi-
cators of communication, it is not plausible to treat their ties as
enduring states, although it often is possible to aggregate event
intensity over a certain period and then view these aggregates as
indicators of states.
While the RSiena manual says that:
In the models, behavioral variables can be binary or ordinal discrete (the extension for con-
tinuous behavioral data is currently being developed). The number of categories should
be small (mostly 2 to 5; larger ranges are possible). In the case of behaviors, Stochastic
Actor-Oriented Models express how actors increase, decrease, or maintain the level of their
behavior.

Some Ideas 

My plan is to create 4 different types of networks - observation, local communication, global communication, and collaboration - and to use these to try to predict how many edits a user would make.

I'm realizing that the way I'm measuring these networks, as well as measuring activity, do not match with the way that SAOM works. For example, someone who observes the work of another user only one time in a period (defined as being the next editor of a page) is quite unlikely to observe the same user again. That is, this definitely represents an "event" instead of a "state".

I haven't decided for sure what to do. My tentative plan is to change my observation network into a dynamic dyadic covariate. As I understand it, this would still test whether or not observing someone else's page makes you more likely to be active.

I think that my other networks can be aggregated, and be made to represent states. That is, users who have communicated via talk pages X times in a period can be considered as having a "communicates with each other" link.

Behavior has to change, too

I definitely need to change my behavior variable, too. Like in most "peer production" environments, activity level follows a power-law curve, with a few users making lots of edits, and most users making a few. This makes this variable tough to model. I tried using quartiles, but even that is messy with this sort of data.

For now, I'm thinking that I will use the number of days they were active (i.e., made an edit) in the period as a measure of activity. This has a clear max, and should have a much less skewed distribution. In addition, my research is situated in the theory of communities of practice, which says that as people join a community, it becomes part of not only what they do, but who they are. In this sense, I think that the days active measure represents how engaged someone is in the community, even if they aren't making lots of edits.


P.S. For now, I decided to just keep my data as adjacency matrices, and not worry about converting it to edge lists. Maybe after I get my analysis done for the Sunbelt conference!

No comments:

Post a Comment