zamba icon indicating copy to clipboard operation
zamba copied to clipboard

Automated generation of splits for training can put rare species in wrong group

Open pjbull opened this issue 3 years ago • 4 comments

This has one weird edge case that is likely rare, so just worth filing an issue for:

v0.mp4, antelope
v1.mp4, antelope  # v1 assigned test by antelope grouping
v1.mp4, cow       # subsequently v1 assigned train by cow grouping
v2.mp4, antelope
v4.mp4, cow
v5.mp4, cow

# test set is now missing antelope

Originally posted by @pjbull in https://github.com/drivendataorg/zamba/pull/169#r768276575

pjbull avatar Dec 14 '21 02:12 pjbull

I tried to replicate the bug. It seems that the val set is missing antelope. v0 is training for antelope, v1 is training for cow, v2 is holdout for antelope, v4 is val for cow, and v5 is holdout for cow.

papapizzachess avatar Nov 18 '23 04:11 papapizzachess

I fixed the bug. However, when I try to push my changes, I get the error: remote: Permission to drivendataorg/zamba.git denied to papapizzachess. fatal: unable to access 'https://github.com/drivendataorg/zamba.git/': The requested URL returned error: 403 Do I need permission to push changes?

papapizzachess avatar Nov 23 '23 15:11 papapizzachess

@papapizzachess when working on projects on GitHub, outside contributors need to fork the repository, contribute to their fork, and then create a pull request from their fork into the original project. That pull request will be reviewed and merged by the project maintainers.

pjbull avatar Nov 23 '23 15:11 pjbull

@papapizzachess when working on projects on GitHub, outside contributors need to fork the repository, contribute to their fork, and then create a pull request from their fork into the original project. That pull request will be reviewed and merged by the project maintainers.

Thanks! I got it to work.

papapizzachess avatar Nov 23 '23 16:11 papapizzachess