scalding icon indicating copy to clipboard operation
scalding copied to clipboard

Add DailyPrefixSuffixTsv and fix delimeter error in DailyPrefixSuffixSource.

Open morazow opened this issue 10 years ago • 4 comments

Hello all,

I have a data source partitioned as /base/path/year/month/day/state/. And I want to run Scalding job for one week of data only for a specific state. Reading a week of data using DailySuffixTsv and then filtering would be waste of resources. Therefore, it would be great to have DailyPrefixSuffixTsv.

Moreover, there should be extra "/" between TimePathedSource.YEAR_MONTH_DAY and suffixTemplate in DailyPrefixSuffixSource. Otherwise, it tries to read /base/path/year/month/daysuffixTemplate/, which is not intended path.

morazow avatar Nov 12 '14 13:11 morazow

The suffixTemplate should just have a "/" to start it, then it will operate fine. (Its how the source is used generally). We should put a require statement into the constructor to enforce this though to avoid future issues.

Source addition itself looks fine to me though

ianoc avatar Nov 14 '14 14:11 ianoc

@ianoc Thanks for the feedback.

Yes, I agree. It makes perfect sense to start suffix with "/". However, I am not quite sure how to put require/assert for that. I saw, there is check in TimePathedSource,

//Write to the path defined by the end time:
override def hdfsWritePath = {
// TODO this should be required everywhere but works on read without it
// maybe in 0.9.0 be more strict
assert(pattern.takeRight(2) == "/*", "Pattern must end with /* " + pattern)
...

I was thinking something like, require(suffixTemplate.charAt(0) == '/', "suffixTemplate should start with /"), but I do not know where to put it because it goes concatenated as a pattern to TimePathedSource.

morazow avatar Nov 17 '14 11:11 morazow

you could add the require to the body of those classes. It will do the check as part of instanciating it then I think

ianoc avatar Dec 02 '14 03:12 ianoc

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 18 '19 15:07 CLAassistant