simstudy icon indicating copy to clipboard operation
simstudy copied to clipboard

Create a rejection-sampling function to sample from non-standard distributions

Open kgoldfeld opened this issue 5 years ago • 2 comments
trafficstars

Maybe nice to have an internal function that provides the ability to generate data from a density function specified as a function.

kgoldfeld avatar Nov 19 '20 20:11 kgoldfeld

#75 #71

assignUser avatar Jul 10 '21 21:07 assignUser

Here is some code that will allow users to specify the shape of a function (by inputting data), and then sampling from the density estimated from those data:

data <- c(1, 2, 2, 3, 4, 4, 4, 5, 6, 6, 7, 7, 7, 8, 9, 10, 10, 10, 10)

# Generate the density estimate
density_estimate <- density(data, n = 10000, from = min(data), to = max(data))
plot(density_estimate, main="Customized Kernel Density Estimation", xlab="Data", ylab="Density")

# Create a function to sample from the density estimate
sample_from_density <- function(density_estimate, n) {
  x <- density_estimate$x
  y <- density_estimate$y
  
  # Normalize the density values to create a probability distribution
  probabilities <- y / sum(y)
  
  # Sample from the x values according to the probabilities
  sample(x, size=n, replace=TRUE, prob=probabilities)
}

# Number of samples you want to draw
n_samples <- 100000

# Sample from the density estimate
samples <- sample_from_density(density_estimate, n_samples)

# Plot the histogram of the samples and the density estimate
hist(samples, breaks=60, probability=TRUE, main="Sampled Data and Density Estimate", xlab="Value", col="lightblue")
lines(density_estimate, col="red", lwd=2)

kgoldfeld avatar May 24 '24 21:05 kgoldfeld

I've created genDataDensity and addDataDensity so that it is now possible to generate data from an arbitrary distribution specified by a vector of integers:

base_data <- c(1, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 5, 6, 6, 7, 7, 7, 8, 9, 10, 10, 10, 10, 10)

dd <- genDataDensity(10000, dataDist = base_data, varname = "x1")

Plotting the generated data with the original density:

de <- density(data, n = 100000)
de <- data.table(x = de$x, y = de$y)

ggplot(data = dd, aes(x=x1)) +
  geom_histogram(aes(y = after_stat(count / sum(count))), binwidth = 1, fill = "lightblue", color = "black") +
  geom_line(data = de, aes(x = x, y = y), color = "red") +
  theme(panel.grid = element_blank()) +
  scale_y_continuous(name = "density\n")

kgoldfeld avatar May 29 '24 17:05 kgoldfeld