handson-ml
handson-ml copied to clipboard
Splitting with hashlib
Good time a day ageron, you have a very nice book.
Anyway I have one question from Chapter 2. Which of algorithm given in book is more faster:
**- 1) def split_train_test(data, test_ratio)
-
- def test_set_check(identifier, test_ratio, id_column, hash=hashlib.md5)
-
- train_test_split from sklearn**
?
Which function is more save memory?
I would be gratefull if you give comments about this. Thanks for your attention
I haven't run any benchmarks, so I don't know. However, you can try using %timeit to measure speed. To measure RAM usage, you could use the memory_profiler package. If you find the answer, it would be great to share it here. 👍 Thanks!