baleen
baleen copied to clipboard
Export to directory other than '.' fails
Issue
Exporting to a directory such as corpora/
with bin/baleen export corpora
results in an error like:
[Errno 2] No such file or directory: 'corpora/corpora/cooking/5b2d180b7af8b43e439b59b0.json'
This is a path expansion bug, as the second corpora/
in the path is not the desired behavior.
Resolution
The fix is straightforward. In version v0.3.3-85-g88d5d7c
, line 211, remove self.root,
.
So, for the block that reads:
for post, category in tqdm(self.posts(), total=Post.objects.count(), unit="docs"):
path = os.path.join(
self.root, catdir[category], "{}.{}".format(post.id, self.scheme)
)
the revision should be:
for post, category in tqdm(self.posts(), total=Post.objects.count(), unit="docs"):
path = os.path.join(
catdir[category], "{}.{}".format(post.id, self.scheme)
)
This change results in the desired behavior on export.
Thanks @agodbehere for the bug report and the clear solution! You're right, there was a duplication of self.root
in catdir[category]
; I've implemented the change you suggested.