Getting statistics on EPUB content

Open rdeltour opened this issue 7 years ago • 1 comments

It would be super helpful to have access to a large and representative data set of real-world EPUB publications, in order to get statistics on the real-word usage of specific EPUB features. That would help us making informed standardization decisions; we could also use that to make decisions about EPUBCheck features.

Does that sound remotely feasible? Where could we start?

Jan 11 '19 09:01 rdeltour

I think Google (and Kobo?) have occasionally run searches across their content for us. That might be the only way to get relatively realistic looks at real-world content?

Jan 11 '19 15:01 dauwhe