google-trends-api icon indicating copy to clipboard operation
google-trends-api copied to clipboard

Data variation/mismatch?

Open junkdeck opened this issue 6 years ago • 9 comments

The data returned by google-trends-api does not fully align with the same queries performed on Google Trends. Is this an inherent issue with scraping data from GTrends, or is the data modified in some way?

junkdeck avatar Sep 04 '18 12:09 junkdeck

The data isn't modified in any way, perhaps the url that this library is hitting is outdated now?

pat310 avatar Dec 02 '18 17:12 pat310

How are you using the api? If you are using a custom timespan, like the past 30 days, you have to set the starttime to 31 days earlier, as Google Trends measures the past 30 days as the 30 days before today, not including today.

merkshroom avatar Dec 02 '18 17:12 merkshroom

I don't think that's it - the same dates have different data.

junkdeck avatar Dec 03 '18 16:12 junkdeck

Can you post the code you're using?

merkshroom avatar Dec 05 '18 20:12 merkshroom

I think the data is different because the API doesn't let you specify on the sub term, like search term or topic or System Software.

As an example, if you type in Unity you can choose between several interpretations of the keyword. Is there any chance for the API to do the same thing?

sutefan1 avatar Dec 17 '18 14:12 sutefan1

I am having the same issue.

tried in API: googleTrendsApi.interestOverTime({keyword: 'marketing', geo:'US'})

return (last few rows show:):

{"time":"1525132800","formattedTime":"May 2018","formattedAxisTime":"May 1, 2018","value":[45],"hasData":[true],"formattedValue":["45"]},{"time":"1527811200","formattedTime":"Jun 2018","formattedAxisTime":"Jun 1, 2018","value":[42],"hasData":[true],"formattedValue":["42"]},{"time":"1530403200","formattedTime":"Jul 2018","formattedAxisTime":"Jul 1, 2018","value":[40],"hasData":[true],"formattedValue":["40"]},{"time":"1533081600","formattedTime":"Aug 2018","formattedAxisTime":"Aug 1, 2018","value":[42],"hasData":[true],"formattedValue":["42"]},{"time":"1535760000","formattedTime":"Sep 2018","formattedAxisTime":"Sep 1, 2018","value":[44],"hasData":[true],"formattedValue":["44"]},{"time":"1538352000","formattedTime":"Oct 2018","formattedAxisTime":"Oct 1, 2018","value":[45],"hasData":[true],"formattedValue":["45"]},{"time":"1541030400","formattedTime":"Nov 2018","formattedAxisTime":"Nov 1, 2018","value":[43],"hasData":[true],"formattedValue":["43"]},{"time":"1543622400","formattedTime":"Dec 2018","formattedAxisTime":"Dec 1, 2018","value":[36],"hasData":[true],"formattedValue":["36"]},{"time":"1546300800","formattedTime":"Jan 2019","formattedAxisTime":"Jan 1, 2019","value":[42],"hasData":[true],"formattedValue":["42"],"isPartial":true}],"averages":[]}}

All values are under 50.

However, manual input shows values much higher (one 70), see screen capture

screen shot 2019-01-19 at 9 10 54 am

Partial Embed of above: "exploreQuery":"geo=US&q=Marketing&date=today 12-m",

thabblegit avatar Jan 19 '19 15:01 thabblegit

Figured out the issue. Google seems to calculate the data based on the period supplied. Default for API is 2004. If you supply a start date of today-12 months, the data will match.

thabblegit avatar Jan 19 '19 15:01 thabblegit

Thanks @thabblegit! Maybe there should be a comment about this in the README

pat310 avatar Jan 19 '19 16:01 pat310

I am facing the same issue. I collected the data by specifying two time intervals: 2007-01-01 2007-08-01 and 2007-07-01 2008-02-01. The two calls result in daily search volume data, with 1 month (July) of overlapping data. Comparing the data side by side, I get: screen shot 2019-01-30 at 4 29 08 pm The left-column comes from the data corresponding to the interval 2007-01-01 2007-08-01, while the second column comes from the data corresponding to the interval 2007-07-01 2008-02-01. They are queries for the same keyword (PROFITABLE BUSINESS), but yield very different values on the same day. I was expecting to find non-zeros at the same days, but there are days where one of the values is very positive and the other is zero, and vice-versa. This makes scaling the data a non-trivial issue.

Any ideas on how to deal with this?

guilherme-salome avatar Jan 30 '19 21:01 guilherme-salome