Sefaria-Export
Sefaria-Export copied to clipboard
Duplicates leading to large repository sizes
cltk-flat and cltk-full seem to duplicate a lot of the content from the json directory. Each one of these directories is 4.1GB, meaning that a git clone operation is extremely slow and requires a lot of disk. (Sparse clone is theoretically possible but very fiddly to set up and very slow to execute, and it has problems with the number of files in the schema directory.)
Would it be possible to do one of the following?
- Put the
cltk*material in a separate git repository - Have a helper script that re-builds the
cltk*based on the information in thejsondirectory if needed - Have a helper script that downloads the
cltk*from an FTP site if needed - Have the
cltk*file trees use symlinks to, rather than duplicating the files from,json - refactor the code not to need largely-redundant file trees