forest-confidence-interval icon indicating copy to clipboard operation
forest-confidence-interval copied to clipboard

Random error in fci.random_forest_error for clasification

Open miranov25 opened this issue 7 years ago • 0 comments

Halo.

I started to use forest-confidence-interval. Thank you for implementing package. After several interaction I converged to following usage:

errors = fci.random_forest_error(clf, k0_training,k0_test,memory_constrained=1, memory_limit=100, calibrate=0 )

I have following comments/suggestions/questions

Memory

  • could yo use some default upper limit for the evaluation < available memory?
    • In my examples with 50000 rows x 6 columns I got >10 GBy memory
    • I had to stop the process
  • flag memory_limit was not working using default pip install (restci 0.3)
    • After installation from sources flag worked properly ()
    • Could you update pip recipe in the pip to use version with working memory limits ?

Errors

  • calibrate method I got O (1000) times higher errors compared to option without calibrate
  • (~1.6+-0.3 instead of ~0.001)
  • using switch calibrate=0 , obtained errors look more realistic (for classification values I assumed errors should be <1)
  • using calibrate method I obtained large spread of error values:
for i in range(0,5):
    errors = fci.random_forest_error(clf, k0_training,k0_test,memory_constrained=1, memory_limit=100, calibrate=1 )
    print(i,errors[0:1000:200])
===> 
(0, array([1.77080289, 1.77080289, 1.77080289, 1.77080289, 1.77080289]))
(1, array([1.60437205, 1.60437205, 1.60437205, 1.60437205, 1.60437205]))
(2, array([1.00765122, 1.00765122, 1.00765122, 1.00765122, 1.00765122]))
(3, array([1.55302694, 1.55302694, 1.55302694, 1.55302694, 1.55302694]))
(4, array([1.36027949, 1.36027949, 1.36027949, 1.36027949, 1.36027949]))
  • I assume that the error estimate using calibrate is overestimated. I will check if the error estimates with calibrate=0 are realistic.
  • Is the problem with my expectation (for classification errors < 1), or is there problem with calibrate method ?
  • Did you try before error estimates for classification

Regards Marian

miranov25 avatar Oct 25 '18 13:10 miranov25