annif icon indicating copy to clipboard operation
annif copied to clipboard

Build and test a larger Finna-based corpus

Open osma opened this issue 8 years ago • 0 comments

The current YSO/Finna corpus generation script performs 3 queries per concept, but only retrieves 100 results for each query because that is the maximum limit for a single search call in the Finna API.

However, when the response indicates that there are more than 100 results available, the script could perform one or more additional requests and so gather a larger corpus. We could then test whether this improves results.

osma avatar Apr 13 '17 12:04 osma