PACTA_analysis icon indicating copy to clipboard operation
PACTA_analysis copied to clipboard

add more robust validation and fixing of portfolio data input

Open Clare2D opened this issue 4 years ago • 3 comments

One report wasn't generating because of the following errors:

-- web_tool_script_1.R --------------------------------------------------------- There were 14 warnings (use warnings() to see them) Error: Problem with mutate() input value_usd. x non-numeric argument to binary operator i Input value_usd is if_else(...). Backtrace: x

  1. +-global::process_raw_portfolio(...)
  2. | -global::calculate_value_usd_with_fin_data(portfolio)
  3. | -%>%(...)
  4. | +-base::withVisible(eval(quote(_fseq(_lhs)), env, env))
  5. | -base::eval(quote(_fseq(_lhs)), env, env)
  6. | -base::eval(quote(_fseq(_lhs)), env, env)
  7. | -_fseq(_lhs)
  8. | -magrittr::freduce(value, _function_list)
  9. | +-base::withVisible(function_list[k])
  10. | -function_list[k]
  11. | +-dplyr::mutate(...)
  12. | -dplyr:::mutate.data.frame(...)
  13. | -dplyr:::mutate_cols(.data, ...)
  14. | +-base::withCallingHandlers(...)
  15. | -mask$eval_all_mutate(dots[[i]])
  16. +-dplyr::if_else(...)
  17. -base::.handleSimpleError(...)
  18. -dplyr:::h(simpleError(msg, call)) Execution halted

This seems to me like an issue in the number formatting of the portfolio file (Swiss). Something to test with the new input reader/audit functionality, hopefully meaning this error doesn't appear again. Just wanted to flag. I can provide the portfolio input file that created this.

FYI @cjyetman

Clare2D avatar Jan 13 '21 13:01 Clare2D

pretty sure this happens when a portfolio has weird number formatting in either number_of_shares or unit_share_price

it’s triggered here https://github.com/2DegreesInvesting/PACTA_analysis/blob/e7f8ca4e182282425f2d4bc2a8617682c4ecc1c3/0_portfolio_input_check_functions.R#L542-L548

I'm wondering if maybe we should add in a forced coercion of the columns to numeric when the portfolio is read in here? It won't "fix" the problem (because much of the portfolio's data will be coerced to NA), but at least it won't trigger a difficult to debug error later on. I'm not 100% sure if that's a good idea. https://github.com/2DegreesInvesting/PACTA_analysis/blob/496f7936b5cec7c29ff505cd0e4948e50d15d159/0_web_functions.R#L194-L230

cjyetman avatar Feb 15 '21 17:02 cjyetman

Alternatively, a fancier pre-processing of the portfolio CSVs needs to be done... either earlier on in our code, or ideally on the server when the portfolio is uploaded.

cjyetman avatar Feb 15 '21 17:02 cjyetman

changed issue name and this issue will be used to track a general task of improving validation and fixing of portfolio data when it is first read in

some things that should likely be checked and dealt with:

  • file encoding
  • file type (CSV, XLSX, etc.)
  • compliance with file type specification (e.g. a TSV as a CSV, or a CSV with a ; delimiter)
  • number formatting weirdness (e.g. 1 000.32, 1.000,32)
  • ISIN validation

cjyetman avatar Dec 04 '21 19:12 cjyetman