reach
reach copied to clipboard
Include integration tests
It would be nice to have some tests of refparse where we could run it locally with smaller data and have a known output. This comes up because when I want to debug refparse, I run
python -m policytool.refparse.refparse \
--scraper-file "s3://datalabs-data/scraper-results/msf/20190117.json" \
--references-file "s3://datalabs-data/wellcome_publications/uber_api_publications.csv" \
--model-file "s3://datalabs-data/reference_parser_models/reference_parser_pipeline.pkl" \
--output-url "file://./tmp/parser-output/output_folder_name"
which takes a long time and I'm not sure how many matches should be found anyway.
It'd be nice to have a json and the csv that are small enough to not take too long to run, and the output we know has 10 matches (for example). This way we can dig into the code and results if we see some unusual results/want to understand input/outputs in more detail.
I spoke about this with @hblanks who may have more details of the technical needs for integration tests. I am purely thinking of the use for it in terms of debugging.
Is this issue still relevant @lizgzil? Or is it being addressed somewhere else in the product (i.e. Matt's work)
@dd207 I'm not sure, I'm not aware of it being addressed. Although @jdu - did you mention that Argo could do checks/at least provide a nice user interface to see changes in the data? Or am I just making that up!?
I think what I was talking was more about separating the individual tasks more which would allow for easier testing of an individual component, that's not really to do with Argo but more to do with being able to set up testing of that single unit of work more easily.