publ-cg
publ-cg copied to clipboard
Getting statistics on EPUB content
It would be super helpful to have access to a large and representative data set of real-world EPUB publications, in order to get statistics on the real-word usage of specific EPUB features. That would help us making informed standardization decisions; we could also use that to make decisions about EPUBCheck features.
Does that sound remotely feasible? Where could we start?
I think Google (and Kobo?) have occasionally run searches across their content for us. That might be the only way to get relatively realistic looks at real-world content?