The Internet Archive is a San Francisco-based digital nonprofit library with a mission of "universal access to all knowledge". It provides free public access to a collection of digital materials, including websites, apps/software games, music, movies/videos, motion pictures, and nearly three million public domain books. In October 2016, the collection reached 15 petabytes. In addition to its archiving functionality, Archives is an activist organization, which advocates for free and open Internet.
The Internet archive allows the public to upload and download digital material to its data groups, but most of the data is automatically collected by its web crawler, which serves to preserve as many public web pages as possible. The web archive, Wayback Machine, contains over 308 billion web captures. The Archive also oversees one of the world's largest book digitization projects.
Founded by Brewster Kahle in May 1996, Archive is a non-profit 501 (c) (3) operation in the United States. It has an annual budget of $ 10 million, coming from a variety of sources: revenue from Web crawling services, partnerships, grants, donations, and the Kahle-Austin Foundation.
Its headquarters are in San Francisco, California. Most of his staff work in a book scanning center. The Archive has data centers in three California cities: San Francisco, Redwood City, and Richmond. In order to prevent data loss in cases such as a natural disaster, Archives tries to make copies (sections) of collections at more remote locations, currently including Bibliotheca Alexandrina in Egypt and facilities in Amsterdam. The Archive is a member of the International Internet Preservation Consortium and officially designated as a library by the State of California in 2007.
Video Internet Archive
Histori
Brewster Kahle founded the Archives in 1996 around the same time when he started the web search company Alexa Internet profits. In October 1996, the Internet Archive had begun filing and preserving the World Wide Web in large numbers, although it saved the earliest pages of May 1996. Archived content was not available to the general public until 2001, when developing the Wayback Machine. At the end of 1999, Archives expanded its collections outside the Web archive, beginning with Prelinger Archive. Now Internet Archive includes text, audio, moving images, and software. It hosts a number of other projects: NASA Images Archive, Archive-It's crawl service contract, and a library catalog of wiki-editable and Open Library information sites. Soon after that, Archives began working to provide special services related to the need for access to information for people with print disabilities; publicly accessible books are available in the Protected Accessible Information System (DAISY) format.
According to its website:
Most societies attach importance to preserving their cultural and heritage artifacts. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more artifacts in digital form. Archive's mission is to help preserve these artifacts and create an Internet library for researchers, historians, and scholars.
In August 2012, Archives announced that they had added BitTorrent to file download options for over 1.3 million existing files, and all newly uploaded files. This method is the fastest way to download media from Archive, since files are presented from two Archive data centers, apart from other torrent clients that have downloaded and continue to serve files. On November 6, 2013, the headquarters of the Internet Archive in Richmond District in San Francisco caught fire, destroying equipment and damaging nearby apartments. According to Archive, he lost a residential building next to one of his 30 scanning centers; cameras, lamps and scanning equipment worth hundreds of thousands of dollars; and "maybe 20 boxes of books and movies, some are irreplaceable, mostly already digitized, and some can be replaced". The nonprofit archive asks for donations to cover damages of about $ 600,000.
In November 2016, Kahle announced that the Internet Archive was building an Internet Archive of Canada, a copy of the archive to be placed somewhere in Canada. This announcement gets extensive coverage because of the implication that the decision to build a backup archive in a foreign country is due to Donald Trump's upcoming presidency. Kahle was quoted as saying:
On November 9th in America, we woke up with a new government that promised radical change. It is a powerful reminder that institutions like ours, built for the long term, need to be designed for change. For us, it means keeping our cultural material safe, private and always accessible. That means preparing a Web that may face greater restrictions. That means serving customers in a world where government oversight will not go away; indeed it seems to be increasing. Throughout history, libraries have struggled against terrible privacy violations - where people have been gathered just for what they read. In the Internet Archive, we struggle to protect the privacy of our readers in the digital world.
Maps Internet Archive
Web archiving
The Wayback Machine
The Internet archive uses the popular term "WABAC Machine" from the cartoon segments Rocky and Bullwinkle, and uses the name "Wayback Machine" for its service that enables the World Wide Web archive to search and access.. This service allows users to view archived web pages. The Wayback engine was created as a joint effort between the Internet Alexa and the Internet Archive when a three-dimensional index was built to allow searching of archived web content. Millions of websites and related data (images, source code, documents, etc.) Stored in the database. This service can be used to view what version of the website used before, to retrieve the original source code from websites that may no longer be available directly, or visit websites that do not even exist anymore. Not all websites are available because many website owners choose to exclude their sites. Just as all sites are based on data from web crawlers, Internet Archives lose many web areas for many other reasons. A 2004 paper found an international bias in its coverage, but found it "unintentional".
The use of the term "Wayback Machine" in the context of the Internet Archive has become common in popular culture; for example, on the television show Law and Order: Criminal Intent ("Legacy", first run August 3, 2008), a computer technology using the "Wayback Machine" to find the archives of a Facebook student- style website.
Snapshots are used to take at least 6-18 months to add, but the site can eventually be added in real time on demand. The "Save Page Now" archiving feature is available in October 2013, accessible at the bottom right of the Wayback Machine main page. Once the target URL is entered and saved, the web page will be part of the Wayback Machine.
Archive-This
Created in early 2006, Archive-It is a web archiving subscription service that allows agencies and individuals to build and conserve digital content collections and create digital archives. Archive- This allows users to customize the retrieval or exclusion of web content they want to preserve for reasons of cultural heritage. Through web apps, Archive-It partners can harvest, catalog, manage, search, search and view their archive collection.
In terms of accessibility, archived websites are full-text searchable within seven days of fetching. Content collected through Archive-It is captured and stored as a WARC file. The primary and backup copies are stored in the Internet Archive data center. A copy of the WARC file may be provided to the partner institution of the subscription for storage and geo-redundancy purposes to their best practice standard. Data retrieved via Archive-It is periodically indexed into the public archive of the Internet Archive.
As of March 2014, the Archive-It has more than 275 partner agencies in 46 US states and 16 countries that have taken over 7.4 billion URLs for over 2,444 public collections. Archive-It partners are universities and college libraries, state archives, federal agencies, museums, law libraries, and cultural organizations, including the Electronic Literature Organization, Archives and State Library of North Carolina, Stanford University, Columbia University, American University in Cairo , Georgetown Library Law, and many others.
Book collection
Text collection
Collection of Internet Archive Text Archives include digital books and special collections from libraries and cultural heritage institutions from around the world. Internet Archive operates 33 scanning centers in five countries, digitizing about 1,000 books a day for a total of over 2 million books, financially supported by libraries and foundations. As of July 2013, the collection includes 4.4 million books with over 15 million downloads per month. In November 2008, when there were about 1 million texts, the entire collection was over 0.5 petabytes, which included raw camera images, cropped and italic images, PDFs, and raw OCR data. Between 2006 and 2008, Microsoft has a special relationship with Internet Archive texts through its Live Search Books project, scans over 300,000 contributed books for collections, as well as financial support and scanning tools. On May 23, 2008, Microsoft announced it would end the Live Book Search project and no longer scan books. Microsoft makes its scanned books available without a contract constraint and donates its scanning equipment to its former partner.
Around October 2007, Archive users started uploading public domain books from Google Book Search. As of November 2013, there are over 900,000 Google digital books in the Archive collection; they are identical to copies found on Google, except without a Google watermark, and are available for unlimited use and download. Brewster Kahle revealed in 2013 that the archive's effort was coordinated by Aaron Swartz, who with "a group of friends" downloaded public domain books from Google slowly enough and from enough computers to stay within Google's limits. They do this to ensure public access to the public domain. The Archive ensures the items are linked and linked back to Google, which never complains, while the library "grumbles". According to Kahle, this is an example of Swartz's "genius" to work on what can give the most for the public good for millions of people. In addition to books, the Archive offers free, anonymous public access to over four million court decisions, legal summaries or exhibits uploaded from the US Federal Court PACER electronic document system via RECAP web browser plugin. These documents have been kept behind federal courts. In the Archives, they have been accessed by over 6 million people by 2013.
Number of texts for each language
Number of texts per decade
Open Library
Open Library is another project of the Internet Archive. The Wiki tries to include web pages for every book ever published: it stores 25 million edition catalog notes. It also aims to be a public library accessible by the web: this library contains the full text of about 1.6 million public domain books (from over five million from the main text collection), which is fully readable, downloadable, and full text searchable ; it offers a two-week e-book loan under the Books to Borrow loan program for over 647,784 non-public books, in partnership with over 1,000 library partners from 6 countries after free registration on the website. Open Library is a free and open source software project, with its source code available for free at GitHub.
Media collection
In addition to the web archive, the Internet Archive stores a vast collection of digital media as evidenced by the uploader to be in the public domain in the United States or licensed under a license that allows redistribution, such as a Creative Commons license. Media is organized into collections by media type (move images, audio, text, etc.), and into sub-collections with different criteria. Each major collection includes a sub-collection of "Communities" (formerly "Open Source") where public contributions are stored.
Audio Collection
Audio archives include music, audiobooks, newscasts, old radio shows, and a variety of other audio files. There are over 200,000 free digital recordings in the collection. Subcollections include audio books and poetry, podcasts, non-English audio, and many others. Voted collection by B. George, director of ARChive of Contemporary Music.
The Live Music Archive sub-collection includes over 170,000 concert recordings from independent musicians, as well as more established artists and music groups with permissive rules on their concert recordings such as Grateful Dead, and more recently, The Smashing Pumpkins. Also, Jordan Zevon has allowed the Internet Archive to host the definitive collection of his father's concert record Warren Zevon. The Zevon collection ranges from 1976-2001 and contains 126 concerts including 1,137 songs.
The Great 78 Project aims to digitize 250,000 78 rpm singles (500,000 songs) from the period between 1880 and 1960, donated by various collectors and institutions. It has been developed in collaboration with ARChive of Contemporary Music and George Blood Audio, responsible for audio digitization.
Brooklyn Museum
This collection contains about 3,000 items from the Brooklyn Museum.
Picture collection
This collection contains over 880,000 items. Close Art Archive, Metropolitan Art Museum - Picture Gallery, NASA Picture, Wall Street Flickr Archive Paste, and USGS Map as well as some collections.
Close Art Archive
The Cover Art Archive is a joint project between Internet Archive and MusicBrainz, which aims to create a cover art image on the Internet. This collection contains over 330,000 items.
Picture of Metropolitan Art Museum
The pictures of this collection come from the Metropolitan Art Museum. This collection contains more than 140,000 items.
NASA Images
NASA Image Archives were created through the Space Act Agreement between the Internet Archive and NASA to bring public access to NASA's images, video and audio collections in a searchable resource. Team IA NASA Images works closely with all NASA centers to continue to add to the ever-growing collection. The nasaimages.org site was launched in July 2008 and has over 100,000 items online at the end of its hosting in 2012.
Occupy Archive of Flickr Wall Street
This collection contains photos of Creative Commons licensed from Flickr associated with the Occupy Wall Street movement. This collection contains more than 15,000 items.
USGS Map
This collection contains over 59,000 items from the Libre Maps Project.
Archive Machinima
One of the sub-collections of Archive Videos Internet Archive is Archive Machinima. This small section holds many Machinima videos. Machinima is a digital art in which computer games, game machines, or software machines are used in sandboxing modes to create moving images, recreate dramas or even publish presentations/lectures. This archive collects various Machinima films from internet publishers such as Rooster Teeth and Machinima.com as well as independent producers. The sub-collection is a collaborative effort between the Internet Archive, How They Got Game research project at Stanford University, Machinima Academy of Arts and Science, and Machinima.com.
Mathematics - Hamid Naderi Yeganeh
This collection contains mathematical images created by mathematical artist Hamid Naderi Yeganeh.
Collection of microfilms
The collection contains about 160,000 items from various libraries including the University of Chicago Library, the University of Illinois at Urbana-Champaign, the University of Alberta, the Allen County Public Library, and the National Technical Information Service.
Moving image collection
Internet Archive stores a collection of about 3,863 widescreen movies. In addition, the collection of Moving Images of the Internet Archive includes: newsreels, classic cartoons, pro and anti-war propaganda, Cellar Video Collection, Collection of "AV Geeks" Skip Elsheimer, preliminary television, and short material from Prelinger Archives, such as advertisements, educational films, and industry as well as amateur and home movie collections.
Subcategories of this collection include:
- IA's Movie I
collection, which contains stop-motion animations filmed with Lego bricks, some of which are "remakes" of feature films. - IA's Election 2004 collection, nonpartisan public resources to share video material related to the 2004 presidential election of the United States of America.
- IA's FedFlix Collection, NTIS-1832 Joint Venture between the National Technical Information Service and Public.Resource.Org featuring "Best US Government films, from training films to history, from our national parks to US Fire Academy and Postal Inspector "
- IA's Independent News collection, which includes sub-collections such as the World At War competition from the Internet Archive from 2001, where contestants make short films showing "why access to history is important". Among the most downloaded video files are the devastating 2007 Indian Ocean earthquake eyewitness record.
- He the Sept. 11 Television Archive , which contains archival records from the world's major television network from the September 11, 2001 terrorist attacks, when they opened live television.
Netlabels
The Archive has a collection of freely distributable music that is streamed and available for download via the Netlabel service. Music in this collection generally has a Creative Commons-virtual virtual label labeling catalog.
Open Educational Resources
Open Educational Resources is a digital collection on archive.org. This collection contains hundreds of free courses, video lectures, and additional material from universities in the United States and China. Contributors from this collection are ArsDigita University, Hewlett Foundation, MIT, Monterey Institute, and Naropa University.
TV News Search & amp; Borrow
In September 2012, Internet Archive launches TV & amp; Borrow the service to look for the U.S. national news program The service is built on closed caption transcripts and allows users to search and stream a 30 second video clip. Upon launch, the service contained "350,000 news programs collected over 3 years from the US national network and stations in San Francisco and Washington D.C." According to Kahle, the service is inspired by Vanderbilt Television News Archive, a similar library of television network news programs. In contrast to Vanderbilt, which limits access to streaming videos to individuals associated with college and university subscriptions, TV News Search & amp; Borrow allows open access to its streaming video clip. In 2013, the Archive received an additional donation of "about 40,000 well-organized tapes" from the treasures of a Philadelphia woman, Marion Stokes. Stokes "has recorded over 35 years of TV news in Philadelphia and Boston with its VHS and Betamax machines."
Other services and efforts
Physical media
Vocalizing a fierce reaction to the idea of ââa book being thrown away, and inspired by Svalbard Global Seed Vault, Kahle now dreams of collecting one copy of every book ever published. "We will not get there, but that's our goal," he said. In addition to the book, Kahle plans to save the old Internet Archive server, which was replaced in 2010.
Software
The Internet Archive has "the largest collection of historical software online in the world," which includes 50 years of computer history in centuries of computer magazines and journals, books, shareware discs, FTP websites, video games, etc. The Internet archive has archived what it describes as "classical software", as a way to preserve it. The project advocates the release of the Digital Millennium Copyright Act of the United States to enable them to bypass copy protection, approved in 2003 for a period of three years. The archive does not offer software to download, because the exceptions are solely "for the purpose of preserving or reproducing digital paper archives published by libraries or archives." The acquittal was renewed in 2006, and in 2009 was extended indefinitely pending further decisions. The Library reaffirms the exceptions as a "Final Rule" without an expiration date in 2010. By 2013, the Internet Archive begins to provide non-enabled video game games via MESS, such as the Atari 2600 E.T. Extra-Terrestrial . Since December 23, 2014, the Internet Archive presents, through DOSBox-based browser emulation, thousands of DOS/PC games for "scholarship and research purposes only" .
Controversies and legal disputes
Grateful Dead
In November 2005, the free download of the Grateful Dead concert was removed from the site. John Perry Barlow identifies Bob Weir, Mickey Hart, and Bill Kreutzmann as drivers of change, according to an article in The New York Times. Phil Lesh commented on the changes on November 30, 2005, posting to his personal website:
I remembered that all the Grateful Dead events were taken from Archive.org just before Thanksgiving. I am not part of this decision-making process and was not informed that the event should be withdrawn. I feel that music is a Grateful Dead heritage and I hope that one way or another it is available to those who want it.
A November 30 forum post from Brewster Kahle summarizes what appears to be a compromise achieved among band members. The recording of the audience can be downloaded or streamed, but the soundboard recording must be available for streaming only. The concert has since been added back.
National security letter
On May 8, 2008, it was revealed that the Internet Archive had successfully challenged the FBI's national security letter requesting logs on unauthorized users.
On November 28, 2016, it was revealed that the second national FBI security letter had been successfully challenged that asked for timber for other undisclosed users.
Uncensored Hosting
On August 17, 2011, the Middle East Media Research Institute published "Al-Qaeda, Jihadist Infest San Francisco, California Based 'Archive of the Internet' Library," detailing how members can post anonymously and enjoy free, uncensored hosting.
Omni magazine
In a story on his Web site, titled "What's going on in the Internet Archive?", Author Steven Saylor notes: "Once in 2012, the entire Omni magazine was uploaded (and made available for download ) on the Internet Archive... Because the old problem should contain hundreds of works that are still copyrighted by many contributors, how is this legitimate? "At least one magazine contributor, author Steve Perry, has publicly complained that he never gave permission for his work to upload ("they did not say a word to me"), and it has been noted that all issues containing Harlan Ellison's work appear to have been lowered. Glenn Fleishman, investigates the question "Who Owns the Omni?", Writing that "Almost all the authors, photographers, and artists whose work appears in magazines have signed contracts that only give short term rights.... [No one] can only print republish or post content from older issues. "
Opposition to SOPA and PIPA charges
Internet Archive blackened its website for 12 hours on 18 January 2012, in protest at the Stop Piracy Online and UU RUU PROTECT IP, two parts of the law in the United States Congress which they claim will "negatively impact the web publishing ecosystem that led to the emergence of the Internet Archive ". This happened simultaneously with the blackout of Wikipedia English, as well as various other protests on the Internet.
Opposition to Google Books settlement
The Internet Archive is a member of the Open Book Alliance, which has been the most outspoken critic of Google Book Settlement. The Archive recommends an alternative digital library project.
Nintendo Power Nintendo Power
In February 2016, the Internet Archive began filing digital copies of Nintendo Power, the official Nintendo magazine for their games and products, which run from 1988 to 2012. 140 first issues have been collected, before Nintendo has archived deleted on August 8 2016. In response to the deletion, Nintendo told the Polygon gaming website , "[Nintendo] must protect our own characters, trademarks and other content... Intellectual use of Nintendo's proprietary intellectual property can weaken our ability to protect and preserve it, or perhaps to use it for new projects ".
Government of India
In August 2017, the Government of India blocked the Internet Archive along with other file-sharing sites, citing fears of piracy after a copy of two Bollywood films allegedly shared through the service.
Turkish
On October 9, 2016, the Internet Archive was blocked in Turkey.
Collection of ceramic archives
The Great Room of the Internet Archive features a collection of over 100 ceramic figures representing Internet Archive employees. This collection, inspired by the statues of the Xian warriors in China, was commissioned by Brewster Kahle, carved by Nuala Creed, and is in progress.
List of sponsors digitize for ebooks
This is a list of some of the digitalization sponsors for e-books on the Internet Archive.
See also
- List of Internet Archive collections
- Public domain music
- Web archiving
References
Further reading
- Kahle, Brewster (November 1996). "Archiving the Internet". Scientific America .
- Kahle, Brewster (November 6, 2013). "Scanning Center Fire - Please Help Rebuild". Internet Archive Blogs .
- Lepore, Jill (January 26, 2015). "The Cobweb". The New Yorker .
- Ringmar, Erik (April 10, 2008). "Release and Dissemination". Times Higher Education Supplement .
External links
- Official website
- Internet Archive Mirror at Bibliotheca Alexandrina, Egypt
- Web Archiving in archive.org, Internet Archive operation details
- Internet Archive in Wayback Machine (archived October 11, 1997)
- The earliest known website in Archives at the Wayback Machine (archived May 12, 1996)
- The earliest known website in Archives at the Wayback Machine (archived May 12, 1996)
- Internet archive (recursive archive)
- The original website from 1996
Source of the article : Wikipedia