pysmbc
pysmbc copied to clipboard
tell method for file objects
Thanks for the very useful code! I am hoping to use this library in conjunction with numpy and/or h5py to write array data to a remote samba filestore. numpy and h5py (as of version 2.9) can both write to file objects, as long as those objects have certain methods. However, the code I am trying to run is currently failing because the smbc.File objects returned by smbc.Context.open() do not have a tell
method (h5py) nor a flush
method (numpy). I've tried to look through the code to figure out whether (a) libsmbclient even supports something like tell
or flush
and (b) whether I could figure out how to do it, but I'm stuck - C++ is not my strength.
So: Would it be possible to incorporate tell
and/or flush
methods into the smbc.File objects? Please and thanks for your time. If I can get this to work, a long-term goal would be to incorporate support for samba filestores into the cottoncandy library (a useful library created by a colleague of mine to store array data in s3fs and google drive)
More generally, I'm also not sure that the absence of tell
and flush
will be sufficient to make my code work (there may be other missing methods too that would only generate errors later). Here are two snippets of example code that I'm hoping to use. I would be most obliged if you could test them, if you do manage to implement tell
and/or flush
methods, to see if anything else breaks.
For h5py:
import smbc
import h5py # must be version 2.9
import os
import numpy as np
# Setup
basedir = 'smb://my.samba.store'
username = 'myusername'
password = 'mypassword'
workgroup = 'myworkgroup'
samba_share = 'myshare'
fdir = 'my_folder'
fname = 'hdf_test.hdf'
ctx = smbc.Context()
cb = lambda se, sh, w, u, p: (w, username, password)
ctx.functionAuthData = cb
# As a side note, specifying workgroup has been necessary
# to get this to work properly for me
ctx.workgroup = workgroup
mode = os.O_CREAT | os.O_TRUNC | os.O_WRONLY
fid = ctx.open(os.path.join(basedir, samba_share, fdir, fname), mode)
# The following lines work with a file object created by:
# fid = open(fname, mode='wb')
hf = h5py.File(fid, mode='w')
rr = np.random.randn(1000,100)
hf.create_dataset('data', shape=rr.shape, dtype=rr.dtype, data=rr)
hf.close()
fid.close()
for numpy savez:
import smbc
import h5py # must be version 2.9
import os
import numpy as np
# Setup
basedir = 'smb://my.samba.store'
username = 'myusername'
password = 'mypassword'
workgroup = 'myworkgroup'
samba_share = 'myshare'
fdir = 'my_folder'
fname = 'numpy_test.npz'
ctx = smbc.Context()
cb = lambda se, sh, w, u, p: (w, username, password)
ctx.functionAuthData = cb
# As a side note, specifying workgroup has been necessary
# to get this to work properly for me
ctx.workgroup = workgroup
mode = os.O_CREAT | os.O_TRUNC | os.O_WRONLY
fid = ctx.open(os.path.join(basedir, samba_share, fdir, fname), mode)
# The following lines work with a file object created by:
# fid = open(fname, mode='wb')
rr = np.random.randn(1000,100)
np.savez(fid, data=rr)
fid.close()
よろしくお願い申し上げます
Just wanted to gently nudge again on this issue - I don't mean to pester, I know what it's like to support open source code, I'm just hoping to for an answer about whether this might even be possible. Thanks for your time.
I have no idea to add flush() and tell() with libsmbclient. https://github.com/samba-team/samba/blob/master/source3/include/libsmbclient.h
If I'm in the same situation as you, I'll transfer the complete file written locally to the remote.
OK, thanks for the reply. Uploading whole files is a potential solution, but one I am hoping to avoid for a few reasons. The biggest reason is that the arrays I am hoping to load / save are quite large, so saving to disk and then uploading takes substantially longer than directly uploading, and quickly creates a large cache (e.g. on cluster machines running jobs). But I can work with it. Thanks again for your useful code.
The obvious answer is to wrap an smbc.File
in your own file-like object. Every time write
is called, count up the number of bytes written before passing it on. And of course also keep track of seek
calls. Then you can implement tell
to return the current position.
As for flush
, I assume that libsmbclient doesn’t implement buffering. So either implement flush as a noop, or (more complicated) manage your own buffering.
(Duck typing FTW!)
Actually, let me amend that. It should be easier to implement a tell
method than I said. Just call lseek
with whence = SEEK_CUR
and an offset of 0 bytes; this will return the current offset without changing it.
Workaround:
class SmbcFileWrapper(object):
def __init__(self, obj):
self.obj = obj
def __getattr__(self, name):
return getattr(self.obj, name)
def flush(self):
pass
def tell(self):
return self.obj.seek(0, os.SEEK_CUR)
I added a custom read
function to fix another issue: https://github.com/hamano/pysmbc/issues/46#issuecomment-629460520
I did 2 pull requests (flush, tell) + a fix for the read function.
Here is a branch with the 3 PR merged together: https://github.com/frafra/pysmbc/tree/nina-fixes
Upstream bug report for tell
: https://bugzilla.samba.org/show_bug.cgi?id=14383
Upstream bug report for flush
: https://bugzilla.samba.org/show_bug.cgi?id=14384