reach icon indicating copy to clipboard operation
reach copied to clipboard

Include integration tests

Open lizgzil opened this issue 5 years ago • 3 comments

It would be nice to have some tests of refparse where we could run it locally with smaller data and have a known output. This comes up because when I want to debug refparse, I run

python -m policytool.refparse.refparse \
    --scraper-file "s3://datalabs-data/scraper-results/msf/20190117.json" \
    --references-file "s3://datalabs-data/wellcome_publications/uber_api_publications.csv" \
    --model-file "s3://datalabs-data/reference_parser_models/reference_parser_pipeline.pkl" \
    --output-url "file://./tmp/parser-output/output_folder_name"

which takes a long time and I'm not sure how many matches should be found anyway.

It'd be nice to have a json and the csv that are small enough to not take too long to run, and the output we know has 10 matches (for example). This way we can dig into the code and results if we see some unusual results/want to understand input/outputs in more detail.

I spoke about this with @hblanks who may have more details of the technical needs for integration tests. I am purely thinking of the use for it in terms of debugging.

lizgzil avatar Jul 22 '19 10:07 lizgzil

Is this issue still relevant @lizgzil? Or is it being addressed somewhere else in the product (i.e. Matt's work)

dd207 avatar Mar 10 '20 11:03 dd207

@dd207 I'm not sure, I'm not aware of it being addressed. Although @jdu - did you mention that Argo could do checks/at least provide a nice user interface to see changes in the data? Or am I just making that up!?

lizgzil avatar Mar 16 '20 10:03 lizgzil

I think what I was talking was more about separating the individual tasks more which would allow for easier testing of an individual component, that's not really to do with Argo but more to do with being able to set up testing of that single unit of work more easily.

jdu avatar Mar 16 '20 13:03 jdu