Title: | R Wrapper for Wikipedia Data |
---|---|
Description: | A simple wrapper for 'Wikipedia' data. Specifically, this package looks to fill a gap in retrieving text data in a tidy format that can be used for Natural Language Processing. |
Authors: | Corydon Baylor [aut, cre] |
Maintainer: | Corydon Baylor <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.9.0 |
Built: | 2025-02-22 03:39:15 UTC |
Source: | https://github.com/cran/getwiki |
This function accepts a string and removes new line indentations and html tags with regex
clean_wiki(string)
clean_wiki(string)
string |
The string to be cleaned |
clean_wiki("<p>some text</p>")
clean_wiki("<p>some text</p>")
Get the text of a wikipedia article by searching a title. For example, entering the search term "France" will return the text of the wikipedia page for France.
get_wiki(title, clean = TRUE)
get_wiki(title, clean = TRUE)
title |
The title or (titles) of the Wikipedia page to be searched. If you would like to query multiple articles, put the titles in a character vector. The maximum number of titles that can be queried at one time is 50. |
clean |
Should getwiki remove html tags from the returned text? |
A single title will return the matched wikipedia article in a string. A vector of titles will return a dataframe with one column equal to the searched titles and one column equal to the matched article content
get_wiki("United States") get_wiki(c("United States", "France"))
get_wiki("United States") get_wiki(c("United States", "France"))
Get the text of a random wikipedia article
random_wiki(clean = TRUE)
random_wiki(clean = TRUE)
clean |
Should getwiki remove html tags from the returned text? |
random_wiki
will return a single named character value whose value is the text of the wikipedia page
random_wiki()
random_wiki()
Search for the top twenty wikipedia pages that match a given query. This function will return a dataframe with the names of the matched articles and the first paragraph of content.
search_wiki(search_term, clean = TRUE)
search_wiki(search_term, clean = TRUE)
search_term |
The search term you would like to use. |
clean |
Should getwiki remove html tags from the returned text? |
search_term will return a dataframe of the top twenty search results. The "title" column will be the title of the articles and the "content" column will be the first paragraph from those articles.
search_wiki("Belgrade")
search_wiki("Belgrade")
Find the page views for an article for the past sixty days.
trend_wiki(title)
trend_wiki(title)
title |
The title of the Wikipedia article you would like trends for. |
trend_wiki will return a dataframe of the past sixty days of page views for the requested title
trend_wiki("Belgrade")
trend_wiki("Belgrade")