Beyond the Bibliography

Posted by: Michael Paskevicius on August 23, 2011

Categories: Learning and Knowledge Analytics, Research, South Africa

I am currently in the process of completing my masters dissertation.  It’s a large document which I have spent the last year or so writing and I am pretty happy with how it has all turned out.  I’ll be sharing more about the contents and findings of the thesis in the next little while.

As I was editing the thesis I thought it might be interesting to try and visualize how referencing occurred within the document.  I had to double check all of the references anyhow, so I thought I would try to make the process more interesting by: programmaticly extracting all of the in text references; creating a list of references and where they occur in the document (by chapter); and then trying to visualize the connections between references and chapters throughout the entire document.

After mentioning this to a number of my colleagues, I decided to document the process as it generated some interest.  Also I find it useful to document the process in case I need to do it again later.  Credits to my colleague Andrew Deacon for helping me formulate this process.

We are going to extract all of the references in the document.  Start by copying all of the text from your document into an advanced text editor such as Notepad++. You can then use the find and replace function within Notepad++ to identify the parenthesis which surrounds each reference. We want to get each reference on its own line so we can generate a list. Open the ‘Find’ utility and turn on the ‘Regular expression’ search mode in the bottom right of the window.

The regular expression search will enable you to search and replace with paragraph breaks.  Start by searching for ‘\(‘ to identify the opening parenthesis.  The backslash is required as an escape character because the regular expression search is turned on.  You can now find and count the number of opening parenthesis in your document.  We want to have each opening parenthesis on a new line, so replace the ‘\(’ with ‘\r\n(‘.

Now we can do the same for the trailing parenthesis.  So do another find and replace this time replace ‘\)’ with ‘)\r\n’.  Now each in text reference should be on its own line.

Now search again this time in normal search mode for an ‘(’, and hit the button ‘Find All in Current Document’.   You should be presented with a list of search results with each in text reference that can be easily copied to Excel.

You will still have to sort out the combined references, et al’s, remove the acronyms, and non-reference parenthesis occurrences.  I also stripped the parenthesis from the text.  Keep the line number so you can determine which chapter each reference occurred within.  If you back to Notepad++ you can grab the line numbers where each section begins and end, mine was quite simple as I only had five chapters.

CH1 lines: 1-132
CH2 133-638
CH3 639-881
CH4 882-1394
CH5 1395-1479

Once you have cleansed the data you should have a clean list of references.  You can use a VLOOKUP to bring in the chapter numbers in Excel.  This process was also very useful for verifying my references and ensuring I used the full reference in a multi author (+3) paper when it first occurred in text.

You can then create a pivot table which will show the references per chapter also showing how many times each paper was referenced.

Now you can take columns A and B shown above and drop them into something like NodeXL to visualize the connections between your reference occurrences (you will have to copy down the ‘CH1’ text).   You can use the ‘Totals’ column to make the connecting lines bigger where a reference was used multiple times.   I use the Total field as the Edge Width connecting the reference to the chapter.   You will also want to make the chapter nodes a larger more distinct object, I have used coloured discs. I have also run a grouping algorithm on the dataset which identifies the references which best group together and applies a colour for each node.


The visual shows the references used in my thesis as they occurred within chapters, and were reused within other chapters.  The central literature which I use to support my thesis ends up in the middle of the image (the central red text).

Looking back, this is quite a useful visual of how I constructed my thesis and how certain resources were woven into the various chapters of my research.  Chapter 2 naturally contains the most references as it forms my literature review.  The literature which ultimately becomes useful in my study is used again in Chapter 5 my conclusion; as well as being mentioned in Chapter 1 as I introduce the study.   Two key resources, Engëstrom, 1987 (theoretical framework) and Cohen, Manion & Morrison, 2007 (research design) are used within Chapter 2 and 3, as I explain and then localise their application in the study.

I am toying with the idea of including this in front of the bibliography in my thesis submission.  Think it might be useful or actually annoy an external examiner?

The most central literature is referenced below:

Conole, G., McAndrew, P. & Dimitriadis, Y. (2010). The role of CSCL pedagogical patterns as mediating artefacts for repurposing Open Educational Resources. In: Pozzi, Francesca and Persico, Donatella eds. Techniques for Fostering Collaboration in Online Learning Communities: Theoretical and Practical Perspectives. Hershey, USA: IGI Global.

Cohen, L., Manion, L & Morrison, K.  (2007). Research methods in education.  6th edition. London: Routledge.

Engeström, Y. (1987). Learning by expanding: An activity-theoretical approach to developmental research. Helsinki: Orienta-Konsultit.

Harley, D., Henke, J., Lawrence, S., Miller, I., Perciali, I., Nasatir, D.  (2006). Use and Users of Digital Resources: A Focus on Undergraduate Education in the Humanities and Social Sciences.  Center for Studies in Higher Education (CSHE), University of California, Berkeley.

Hatakka, M. (2009). Build it and They Will Come? – Inhibiting Factors for Reuse of Open Content in Developing Countries.  The Electronic Journal of Information Systems in Developing Countries. EJISDC (2009) 37, 5, 1-16.

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Anatomy of an incident: Helicopter crash at UCT

Posted by: Michael Paskevicius on August 4, 2011

Categories: Learning and Knowledge Analytics, Research, South Africa

This work was inspired by the “Anatomy of the Osama Tweet“, in which the news of Osama Bin Laden’s death spreads just before it was officially announced by U.S. President Obama.  Naturally the original speculative tweet spread virally through Twitter.  My colleague Andrew Deacon and I have undertaken a similar analysis of information flows during an incident which occurred on the University of Cape Town (UCT) yesterday. With the proliferation and adoption of social networks, information has a tendency to spread through networks faster and more virally.  Now because it is happening electronically we can retrospectively run an analysis of how it all plays out.   But it takes the right kind of incident to actually warrant the viral spread of information.  As this article highlights, most of the information posted to popular social networks actually goes nowhere, in terms of amplification or repetition (93% of tweets go nowhere for example).

So when I heard about the crash landing of a helicopter at UCT I knew we would have a viral situation on our hands.  In fact on the way in to the office, just following the incident, I was already hearing students chatting about it on the city campus.  People chat in corridors, sms messages are sent, phones ring, bbm’s buzz, whatsapp’s beep, Facebook statuses are updated, and Tweets are posted to Twitter.  The latter, Twitter, can be said to be the most open and accessible of networks for analysis.  We collect tweets from Twitter on an ongoing basis whenever someone tweets about #UCT and our dataset is slowly growing.  This incident prompted many people to tweet about UCT and the helicopter crash from all over South Africa and the world.  So whereas only a few years ago one might have only heard about this incident on the six o’clock news, people all over the world were already discovering the story by getting the news from social networks.

We harvested 1168 nodes and 1681 edges worth of data from Twitter at about 3pm on August 3rd, 2011.  The tweets had to have included the words “UCT” and “helicopter” to be included.  That means our analysis includes 1168 distinct people who Tweeted about the incident and within this network of people 1681 were connected in some way.  This connection may be a retweet (amplification) or mention (reference).  The lines show these connections between the nodes (people) tweeting about the incident.  Zoom in with your mouse or the + button in the image to explore.

One can see the twits that were most responsible for the spread of information via Twitter in this visual quite clearly.  Mr_capeTown and GarethCliff are shown as very central nodes in the diagram.  As is the media outlets Radio702, 945Kfm, MyNews24 and UctRadio. There are a host of satellite networks around the outside of the diagram as well, which are not connected to the central network.

If you want to explore this social network graph in higher detail download this PDF file.  Within the PDF you can zoom in ultra close and even search to find your Twitter user name.

Most frequently retweeted or mentioned users are shows in the graph below.

The incident was said to have happened around 9:30 am and the tweets started pouring in fast. The image below shows the number of tweets per five minute interval following the incident.

The first photos were posted by folks who captured the scene on their cell phone cameras.  The two most popular images which are still available here and here.  Gradually, news teams arrived and the following video by official media outlet News24 was posted to YouTube.

Only a portion of users tweets had their location data.  There is an option  in Twitter to have your coordinates included with each tweet.  Based on the data we had available we came up with the following view of tweets originating in South Africa.  The lines again represent retweets and mentions between locations.

Circles representing tweets in the bottom left are from Cape Town, then along the Southern Coast to Port Elizabeth, East London and Durban.  The large cluster diagonally up from Cape Town would be Johannesburg/Pretoria with Bloemfontein in between.

Globally the available location data is shown below.

We can infer that tweets went out from parts of Europe, Asia, North America and other African countries.  In many cases these foreign tweeters were in some way connected to a South Africa tweeter indicating a viral global spread.

Discussion

There are a number of limitation to this small scale real-time study.  We have only captured the conversations which were happening within Twitter in the first case.  Furthermore we only capture tweets which include the words “UCT” and “helicopter”, thus missing tweets which use abbreviations such as “heli” etc.  We are also missing the conversations which exist within the other social networks.  I am sure that the conversations which happened in Facebook, a much more widely adopted social network in South Africa, would also be incredibly interesting to analyze.  We have additionally noticed activity streams on the event happening in LinkedIn.

One of the toughest current hurdles in doing such an analysis is actually getting at the data you want.  In this case we used NodeXL to extract the data from Twitter using the Twitter API.  We then build the network graphs in Gephi while using excel to summarize the data.  One of the reasons we were able to get access to this data was that the event happened suddenly and quickly and was not too massive.  Larger events which unfold over time seem to be more difficult to gather data for at this point.

I think its remarkable that this event happened yesterday and we are already able to analyze the social nature of some of the information flows around the event. The next phase of this analysis is to try and replay the flow of tweets based on their timestamp.  We want to examine exactly how the information flowed through the network over time.  I am surprised that the current toolset we have does not allow this.  Gephi does have a ‘Graph Streaming’ tool, but it has to be hooked up to a live feed of information using JSON.   We are looking for help or suggestions as to how to use this dataset to reconstruct the social spread of information.  Leave us a comment if you have any tips!

 

 

VN:F [1.9.13_1145]
Rating: 9.0/10 (1 vote cast)
VN:F [1.9.13_1145]
Rating: +2 (from 2 votes)

I love Google. Their tools have really changed my life, in many ways making information management a pleasure.  Now they’ve launched a new social network meant to help us organize our social life by categorizing our contacts into ‘circles’.  The circles are meant to let you control who you share content with.  Just as you might traditionally share a link, thought, picture, or video on Facebook or Twitter, now you can designate which circle of friends gets to see that content.  Makes sense right?  Some content you might want to share could be useful only to your work contacts, or even a select group within your work contacts.  So Google has given us the tools to do this, and I like many probably started building this social fabric around circles of friends.

Then I got to thinking about what I was doing and other recent experiments with Facebook.  After exporting my social data from Facebook a few weeks back I was able to computationally calculate groups of friends based on the number of connections they had within my network.  I then applied a size metric to the people in my social network who were most central to my network.  These techniques worked quite well allowing me to confirm many of the clusters that the diagram presented. The technique worked well and my network was well defined both geographically as well as by the type of social circle they existed in (eg. work, friends, acquaintances)  Below is a diagram of my social graph showing the well clustered groups of people in my network.

So I am beginning to wonder why I am deliberately now creating a new social graph in Google+.  I do find the ability to organize contacts into groups very useful, but if I can calculate these groups using a few simple social network analysis (SNA) techniques, I would rather do it that way.

In a perfect world, we would own our own social graphs.  It could work something like this: I export my social network from Facebook, Twitter, or whatever social network I have invested in; I calculated the identifiable social groups using a SNA tool like Gephi; and I can then apply that social graph to any other social network I am interested in. I can take my social graph and apply it to an image sharing site like Flickr to see which of my friends, contacts, colleagues or family have public photo collections online.

Rather than building a social network in many places (eg. Facebook, Twitter, LinkedIn, MySpace, Friendster, WAYN… and the list goes on and will continue to grow:) wouldn’t it make more sense to open up the social graph standard so that it could be applied wherever we desire?

Just thinking out loud really, Ill still be playing with Google+ in the future.  I have not yet found it overly valuable though, my stream feels very heavy.  It’s lacking the simplicity that Twitter brought us by limiting updates to 140 characters.

 

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Visualizing Facebook Groups

Posted by: Michael Paskevicius on June 15, 2011

Categories: Learning and Knowledge Analytics, Research

I have continued exploring ways in which one can visualize Facebook networks.  Here I have used the Netvizz application to extract networks of people who have joined a particular Facebook group.  You can get access to group network data for any group that you are a member of.  I decided to examine the ‘UCT – University of Cape Town’ group of which I am a member. At the time of extraction the group contained 2248 people.  People in the group are not friends by default, but there were many instances where people were friends, in total 5960 relationships existed in the network beyond people having joined the group.  When you first load the network data into Gephi is looks something like this.

You have to run some statistics and some algroithms to make things interesting.  I ran the Modularity Report which I believe attempts to identify communities within the network.  Once you run the report you can then partition the nodes by colour according to the communities discovered.  I then ran the Radial Axis Layout and grouped the nodes by modularity to generate this view.

The network turned out to be quite diverse with many disparate and small groups of friends represented in this group.  The clusters of groups of people are stacked around the circle and the lines between connect individuals within different groups.

We then ran the OpenOrd layout on the data and came up with this interesting visual.

We don’t know exactly what this particular layout is showing, but it is very interesting the way in which the nodes swell and settle when you run the program. Will continue exploring.

 

VN:F [1.9.13_1145]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Visualizing my Facebook Social Network

Posted by: Michael Paskevicius on May 28, 2011

Categories: Learning and Knowledge Analytics, Research

I should be working on my thesis right now.

Ok, now that I have gotten that out of the way.  I wanted to share a small project I have been working on in visualizing my social network on Facebook.  Your own personal social graph can quite easily be exported from Facebook using the Netvizz application. Once you have installed the app in Facebook and exported the graph you will get a text file which can be fed into a social network visualization tool such as Gephi.  You might remember I was using Gephi for the learning and knowledge analytics course earlier this year.

After feeding my social graph into Gephi and following the techniques in this slideshare presentation by sociomantic labs I came up with the following visual of my network.

Upon closer inspection and investigating which nodes were who in my social network I discovered that this image is a pretty interesting view of my Facebook friends. I was quickly able to identify major clusters of people in my life grouped together and segmented by colour.

So the entire left side is basically people I have met and worked with in Namibia and South Africa.  They are quite neatly organized together.

On the right side is my social network from Toronto and around Canada. I believe the larger circles represent people with lots of wall posts – but I am still investigating.

Those lost nodes in the middle are interesting as well.  Contacts I have met who are not closely related to my larger social circle. Also my family appears as a cluster here as they are not linked to my social circle – probably a good thing!

 

VN:F [1.9.13_1145]
Rating: 8.5/10 (2 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)
View in: Mobile | Standard