My initial idea for the title of this piece was “The Accidental Archive”. This came from a notion I had, received from headlines of articles I usually didn’t actually read, that “what we post online is forever”. I was going to suggest that while this fact is scary from a privacy perspective, as well as from a “Facebook is controlling us through their market research and advertising” perspective, it is also a wonderful opportunity for media scholars, anthropologists, linguists and future historians. If everything posted on the Internet lasts forever, that is a veritable goldmine for researchers using Internet posts as data, not only for insidious product selling purposes, but also for genuine social science and archiving of the past. However, looking past the first page of a Google search showed me that like many sensationalist headlines (as a linguist interested in digital communication, the likes of “how text messaging is wrecking our language” comes to mind), this is not only an unnecessarily negative perception, it is also false. It turns out, there is a job for archivists on the Internet after all, and it is a monumental one. Far from everything posted on the Internet becoming part of “an endless record” that will be “very hard to erase”, it is in fact very hard to save web pages before they expire. And the sheer volume of what could make up an Internet Archive is unprecedented.

While it may be relatively easy for a technologically skilled person to find old posts and even trace deleted ones on a busy site like Facebook, the average web page on the Internet only lives for 92 days. A 2013 study found that 49% of hyperlinks in Supreme Court cases in the USA were broken, a phenomenon referred to as “link rot”. According to historian Abby Smith Rumsy, the idea of the Internet as permanent comes from its omnipresence in our lives. However, this omnipresence is more like a river flowing past, perhaps carving out impressions here and there, than a dam filling up with permanently accessible water (ha). Moreover, even if the content itself is preserved, this does not mean it is accessible, as the source code and hardware required to display it is constantly evolving.

As far back as 1996, Brewster Kahle had the foresight to begin preserving web pages, when he instituted the Internet Archive. This is based on software called the “wayback machine”, a reference to a time-travelling device from a 1960s-1980s cartoon series. Software “spiders” go to websites, follow the links on those websites and so on, archiving as they go. However, although the ultimate aim is to preserve everything, because this is currently impossible, the decision of what to save is based on popularity, determined by the number of references and links to the page. It is also monitored by approximately 1000 librarians and experts around the world. Some of the more newsworthy tasks of these experts are to do with saving pages that are considered significant based on things other than popularity. For example, fake news before it is deleted, and leaked information before it is altered or censored. Thus, despite the ostensibly indiscriminate nature of the preserving done by the Wayback Machine, deciding what to preserve is still one of the biggest decisions for the human archivists to make. Hence, the project of curation begins even as the archive is being created.

The Internet Archive has been spoken of in almost religious terms, hailed by Stewart Brand as a savior of the “severe amnesia” that besets “civilization” due to the rapid turnover and resulting loss of the Internet today. Brand writes that the Internet Archive is “the beginning of a cure” for this, and is such a significant project that it will be looked back on “with the fondness and respect that people now have for the public libraries seeded by Andrew Carnegie a century ago”. The fact that he uses the term “civilization” is perhaps telling, as the notion that the whole memory of the human race is now made up of the Internet is not only bizarre, it is also very biased towards the global north. Brand also claims that the Archive is the start of a “searchable memory for society”, that would be useful and accessible not just to scholars, but “everyone”.

This misconception that what is openly available on the Internet is accessible to “everyone”, whether coming from innocent ignorance and lazy thinking, or a cognitive bias based on a racist and imperialist mindset, is a damaging one. Internet access costs money, as do the devices required to access that access. Even for those individuals with money, potential access is not the same in all countries, as many are limited by slow Internet connections. Critical Internet scholars have a term for this: “the digital divide”. In addition to such economic gaps, there are other aspects that affect accessibility, such as the languages that make up the digital landscape. The Internet Archive itself is presented in the medium of English, and the Internet as a whole is made up of a majority of English, and 91% of it is made up of only ten languages. Considering that conservative estimates put the number of languages on earth at over 6000, and despite the prevalence of English not everyone can speak it, this represents another failure of access. The way the Internet is curated limits its ability to be the accessible open source haven it is often hailed as.

If this article has become a rant about the failings of the Internet, I am myself guilty of causing misperceptions. The Internet does fall into the failings and biases that continue to perpetuate the dominance of the global north, the west, and the Anglophone world over “the rest”. However, it nevertheless represents a space where innovative perspectives can develop in a bottom-up manner, and where knowledge production occurs that can be much more refreshing, pioneering and relevant than that which is developed in elite universities. This is in addition to its mobilizing power for revolutionary movements, and my personal interest, the forms of communication that are being developed through creative use of new communication technologies. It is this personal and creative use of technologies and spaces, often in ways not intended by their creators, that interests me about the Internet. An attempt to create and curate an archive of the entire Internet is a fascinating project, but what really intrigues me is how individuals create and curate their own archives using the Internet.

The notion of social media sites like Facebook as a place to exhibit and curate one’s public identity is not an unfamiliar one. We exhibit statuses, photographs, links and other forms of media, and we curate our timelines to showcase the kind of selves we wish our ‘friends’ to see. There are many academic articles linking social media use to theories of performance, particularly those of Erving Goffman. Lay perspectives often emphasize the curated aspect of social media sites, linking this to the idea of falseness. This is yet another feature of social media and technology that can be railed against – it encourages us to portray ourselves only in the best light, sacrificing honesty and leading to fake impressions of ourselves etc. Apparently it is vital that we show our raw and vulnerable, fully honest selves to approximately 350 acquaintances. Of course we should problematize the binaries of “fake” and “true” selves (hello performativity), as well “private” and “public” selves. Social media sites are by definition considered to be fully social spaces where we curate our public personas. However, what often gets overlooked in the discourse on social media are the personal functions it can fulfill. Although exhibiting and curating is clearly a significant aspect of Facebook, an article by Zhao et al. (2013) brings to light the fact that archiving, and particularly the creation of a deeply personal archive, is also an important way that users interact with Facebook.

This illustrates that as consumers we are not controlled by digital determinism – we take the affordances that technology provides and often use it in ways not intended by its creators. Zhao et al. took the theoretical notions of performance, curation and archiving, and performed some empirical research. They found that users of Facebook did engage in performance activities through their posts and statuses, and curatorial activities through the active managing of their timelines. Moreover, they also used Facebook as a way to archive personal memories for themselves. Zhao et al. describe these three purposes in spatial terms, as a performance region, an exhibition region, and a personal region. There is a sense among Facebook users that from the top of one’s timeline and a certain length down is a space for publicity. Users curate this space from the perspective that others may see it, and actively manage it by hiding certain posts from view and keeping others near the top, not to mention deciding on one’s profile picture, cover photo and ‘featured photos’.

However, there is an assumption that people looking at one’s timeline will only look a certain length of the way down, and past this point is a space that is more personal. People use this space to archive photographs and other posts as their own personal memories. Although this is not what Facebook was designed for, it should not be surprising considering that much of the way we document experiences these days takes digital rather than physical form, with photos on our phones instead of on reels of film. Facebook presents a handy way to archive these documentations. Of course it is possible to save photographs and other kinds of media on computers and hard drives, but it is not merely the photograph that we archive on Facebook, but the unique way it was exhibited and reacted to, for example in the form of descriptions, “reactions” and comments. Moreover, to use a hackneyed phrase, our devices may change, but the cloud is forever. Even working within the dualisms of “personal” and “public”, we can see that social media can have personal archiving function as well as an exhibition function.

[1]I would like, ultimately, to get back to problematizing these dualisms. In a world where so many people are not seen by others the way they see themselves, or are forced to hide the way they see themselves, the Internet can be a place to make their personal, “hidden” selves public. We already know that “the personal is political”, but perhaps the personal made public can also be political. For example, for LGBTIQ+ youth who are not seen and heard for who they are in their physical environments, or who are forced to hide who they are because of oppressive environments and very real threats to their physical safety, the Internet offers safe(r) and freer spaces. They can create supportive networks, gain access to resources and curate the image of themselves so they are perceived in a way that aligns with what they know about themselves, or explore their identities in spaces that allow for fluidity. This is not to say that “the Internet” is a safe space. Bullying that occurs online can be some of the most damaging. However, it offers the opportunity to create one’s own, carefully curated spaces and supportive communities that can be crucial for those who do not find freedom or support in their physical communities. These carefully curated presentations of the self are not “fake”, but a way to perform identity in a manner that has much more agency than the way they are forced to perform it in the physical world, or in the way it is projected onto them by others.

Not to minimize the value the Internet holds as a way of creating spaces for marginalized identities to thrive, I do feel the need to point out again that it has its own exclusions. Even in the most progressive spaces, the loudest voices are often the most privileged, for example those that are white, cisgender, and able-bodied. Not everyone has access to the Internet, smartphones or computers, and not everyone has the same level of access. Most of the communities on the Internet use English as a mode of communication and come from specific cultural frames of reference, particularly the USA but more generally the global North. This is not to say that young people in Africa are not using the access they have to digital resources in highly creative ways. However, there has been little work done on any non-English speakers online, let alone those from Africa. And yet, a quick search on Twitter for #Africantwitter will show that while these voices may be marginalized, they emphatically do exist. I would argue that when it comes to an Internet archive, it is not enough to save items based solely on popularity, or on newsworthiness. If the aim is indeed to create an archived “memory for society” it must include those whose voices do not make it into the mainstream. Any (worth their salt) historian will tell you that the physical archives they use for research are entirely untrustworthy and must be read against the grain. This is because they mostly come from The People In Charge, who more often than not were oppressing large numbers of people whose voices were not explicit in the archive. We should not make the same mistake of excluding the most marginalized voices in an Internet archive. The result would not only lead to a highly skewed view of society, but would be missing some of its most interesting content.














[1] This paragraph is about LGBTQI+ youth and the spaces they have created for themselves online. As a cis-het woman, I am not speaking about this from a position of experience, but from what I have seen and heard from queer people online. I have included it in this article because I did not want to leave it out of a discussion about curating oneself on the Internet. I made the decision that it was better to speak about it than keep it invisible in this particular article, but I am fully aware that I am not the person that deserves credit for speaking about it. Young LBTQI+ people have been doing amazing work (the “refreshing, pioneering and relevant” knowledge production I mentioned earlier on) that is far more nuanced and deeply analytical than what I can say. Thus, I have included some links below to some of their writings. Please use these as your source of information on this topic rather than what I have written. In addition, please do not hesitate to let me know if anything I have said is misrepresentative or problematic.