charlatan copied to clipboard
Create fake data in R
makes fake data, inspired from and borrowing some code from Python's faker (
Make fake data for:
- person names
- jobs
- phone numbers
- colors: names, hex, rgb
- credit cards
- DOIs
- numbers in range and from distributions
- gene sequences
- geographic coordinates
- emails
- URIs, URLs, and their parts
- IP addresses
- more coming ...
Possible use cases for charlatan
- Students in a classroom setting learning any task that needs a dataset.
- People doing simulations/modeling that need some fake data
- Generate fake dataset of users for a database before actual users exist
- Complete missing spots in a dataset
- Generate fake data to replace sensitive real data with before public release
- Create a random set of colors for visualization
- Generate random coordinates for a map
- Get a set of randomly generated DOIs (Digital Object Identifiers) to assign to fake scholarly artifacts
- Generate fake taxonomic names for a biological dataset
- Get a set of fake sequences to use to test code/software that uses sequence data
Reasons to use charlatan
- Lite weight, few dependencies
- Relatively comprehensive types of data, and more being added
- Comprehensive set of languages supported, more being added
- Useful R features such as creating entire fake data.frame's
cran version
dev version
high level function
... for all fake data operations
x <- fraudster()
#> [1] "Building control surveyor"
#> [1] "Dr. Elissa Kassulke"
#> [1] "MediumBlue"
locale support
Adding more locales through time, e.g.,
Locale support for job data
ch_job(locale = "en_US", n = 3)
#> [1] "Purchasing manager" "Clinical embryologist" "Product manager"
ch_job(locale = "fr_FR", n = 3)
#> [1] "Conducteur d'engins de travaux publics"
#> [2] "Vitrailliste"
#> [3] "Installateur en télécoms"
ch_job(locale = "hr_HR", n = 3)
#> [1] "Arhivski tehničar"
#> [2] "Član kabinske posade zrakoplova"
#> [3] "Diplomirana medicinska sestra/medicinski tehničar"
ch_job(locale = "uk_UA", n = 3)
#> [1] "Зоолог" "Фермер" "Модель"
ch_job(locale = "zh_TW", n = 3)
#> [1] "發包人員" "運輸交通專業人員" "導遊"
For colors:
ch_color_name(locale = "en_US", n = 3)
#> [1] "NavajoWhite" "HoneyDew" "Aquamarine"
ch_color_name(locale = "uk_UA", n = 3)
#> [1] "Абрикосовий" "Міжнародний помаранчевий"
#> [3] "Умбра"
More coming soon ...
generate a dataset
#> # A tibble: 10 × 3
#> name job phone_number
#> <chr> <chr> <chr>
#> 1 Mr. Danial Rau Insurance broker 322.454.0638x452
#> 2 Ms. Augusta Flatley DDS Air traffic controller 1-412-252-8256x816
#> 3 Ahmed Tromp Dealer 738.618.3766
#> 4 Elle Parker-Pagac Engineer, petroleum 1-563-823-9417
#> 5 Nelie Bogisich Audiological scientist (657)263-8451x928
#> 6 Dr. Venita Bartoletti Gaffer (701)117-8665x092
#> 7 Clarke Halvorson Futures trader (587)244-0897x4646
#> 8 Keith Marvin Cytogeneticist (505)188-7137
#> 9 Kellan Swift Primary school teacher 642-015-6852x72341
#> 10 Dr. Shanell Braun Warden/ranger 586-673-4593x4166
ch_generate('job', 'phone_number', n = 30)
#> # A tibble: 30 × 2
#> job phone_number
#> <chr> <chr>
#> 1 Geographical information systems officer 00023575617
#> 2 Industrial/product designer 251.018.7002
#> 3 Special effects artist 04931219014
#> 4 Solicitor 097.433.7373x183
#> 5 Research scientist (maths) 1-540-787-9748x7124
#> 6 Retail buyer 878.896.3368x58978
#> 7 Engineer, technical sales 09564477842
#> 8 Volunteer coordinator 722-926-5502
#> 9 Museum/gallery exhibitions officer 708.958.6259x3348
#> 10 Probation officer +10(4)9172449874
#> # … with 20 more rows
#> # ℹ Use `print(n = ...)` to see more rows
person name
#> [1] "Fenton Ryan"
#> [1] "Sylva Klein" "Kanye Muller PhD"
#> [3] "Phoebe Altenwerth" "Alvie McClure"
#> [5] "Nils Mann" "Santiago Koepp"
#> [7] "Jeanmarie Graham-Larkin" "Mr. Humberto Davis PhD"
#> [9] "Georgine Zulauf" "Pascal Schaefer-Feest"
phone number
#> [1] "+57(2)2951130202"
#> [1] "946.325.0782" "1-121-631-0553" "669.979.2952x566"
#> [4] "(145)481-9199x487" "594.225.2171x504" "(910)235-3893x289"
#> [7] "1-660-490-0565x59870" "1-340-087-1768x51605" "554-891-7210x6337"
#> [10] "750.606.3428"
#> [1] "Restaurant manager"
#> [1] "Lighting technician, broadcasting/film/video"
#> [2] "Chiropodist"
#> [3] "Wellsite geologist"
#> [4] "Animal nutritionist"
#> [5] "Biomedical scientist"
#> [6] "Risk analyst"
#> [7] "Historic buildings inspector/conservation officer"
#> [8] "Intelligence analyst"
#> [9] "Advertising account planner"
#> [10] "Engineer, chemical"
credit cards
#> [1] "JCB 16 digit"
ch_credit_card_provider(n = 4)
#> [1] "VISA 13 digit" "Voyager" "JCB 15 digit" "VISA 13 digit"
#> [1] "4888106530181587"
ch_credit_card_number(n = 10)
#> [1] "3528539455946294754" "4206980387974" "561220125494227"
#> [4] "4746311035020536" "4054993945433911" "869982827809211136"
#> [7] "4280701571800733" "4068247476037565" "4085741624331754"
#> [10] "4714129292203"
#> [1] "599"
#> [1] "694" "083" "7532" "074" "245" "354" "683" "763" "998" "410"
Usage in the wild
- eacton/R-Utility-Belt-ggplot2 (
- Scott Chamberlain (
- Kyle Voytovich (
- Martin Pedersen (
similar art
- wakefield (
- ids (
- rcorpora (
- synthpop (
- Please report any issues or bugs.
- License: MIT
- Get citation information for
in R doingcitation(package = 'charlatan')
- Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.