handson-ml icon indicating copy to clipboard operation
handson-ml copied to clipboard

Splitting with hashlib

Open Dess1996 opened this issue 4 years ago • 1 comments

Good time a day ageron, you have a very nice book.

Anyway I have one question from Chapter 2. Which of algorithm given in book is more faster:

**- 1) def split_train_test(data, test_ratio)

    1. def test_set_check(identifier, test_ratio, id_column, hash=hashlib.md5)
    1. train_test_split from sklearn**

?

Which function is more save memory?

I would be gratefull if you give comments about this. Thanks for your attention

Dess1996 avatar May 14 '21 06:05 Dess1996

I haven't run any benchmarks, so I don't know. However, you can try using %timeit to measure speed. To measure RAM usage, you could use the memory_profiler package. If you find the answer, it would be great to share it here. 👍 Thanks!

ageron avatar May 27 '21 01:05 ageron