pysmbc icon indicating copy to clipboard operation
pysmbc copied to clipboard

tell method for file objects

Open marklescroart opened this issue 6 years ago • 9 comments

Thanks for the very useful code! I am hoping to use this library in conjunction with numpy and/or h5py to write array data to a remote samba filestore. numpy and h5py (as of version 2.9) can both write to file objects, as long as those objects have certain methods. However, the code I am trying to run is currently failing because the smbc.File objects returned by smbc.Context.open() do not have a tell method (h5py) nor a flush method (numpy). I've tried to look through the code to figure out whether (a) libsmbclient even supports something like tell or flush and (b) whether I could figure out how to do it, but I'm stuck - C++ is not my strength.

So: Would it be possible to incorporate tell and/or flush methods into the smbc.File objects? Please and thanks for your time. If I can get this to work, a long-term goal would be to incorporate support for samba filestores into the cottoncandy library (a useful library created by a colleague of mine to store array data in s3fs and google drive)

More generally, I'm also not sure that the absence of tell and flush will be sufficient to make my code work (there may be other missing methods too that would only generate errors later). Here are two snippets of example code that I'm hoping to use. I would be most obliged if you could test them, if you do manage to implement tell and/or flush methods, to see if anything else breaks.

For h5py:

import smbc
import h5py # must be version 2.9
import os
import numpy as np

# Setup
basedir = 'smb://my.samba.store' 
username = 'myusername'
password = 'mypassword'
workgroup = 'myworkgroup'
samba_share = 'myshare'
fdir = 'my_folder'
fname = 'hdf_test.hdf'
ctx = smbc.Context()
cb = lambda se, sh, w, u, p: (w, username, password)
ctx.functionAuthData = cb
# As a side note, specifying workgroup has been necessary
# to get this to work properly for me
ctx.workgroup = workgroup
mode = os.O_CREAT | os.O_TRUNC | os.O_WRONLY
fid = ctx.open(os.path.join(basedir, samba_share, fdir, fname), mode)

# The following lines work with a file object created by:
# fid = open(fname, mode='wb')
hf = h5py.File(fid, mode='w')
rr = np.random.randn(1000,100)
hf.create_dataset('data', shape=rr.shape, dtype=rr.dtype, data=rr)
hf.close()

fid.close()

for numpy savez:

import smbc
import h5py # must be version 2.9
import os
import numpy as np

# Setup
basedir = 'smb://my.samba.store' 
username = 'myusername'
password = 'mypassword'
workgroup = 'myworkgroup'
samba_share = 'myshare'
fdir = 'my_folder'
fname = 'numpy_test.npz'
ctx = smbc.Context()
cb = lambda se, sh, w, u, p: (w, username, password)
ctx.functionAuthData = cb
# As a side note, specifying workgroup has been necessary
# to get this to work properly for me
ctx.workgroup = workgroup
mode = os.O_CREAT | os.O_TRUNC | os.O_WRONLY
fid = ctx.open(os.path.join(basedir, samba_share, fdir, fname), mode)

# The following lines work with a file object created by:
# fid = open(fname, mode='wb')
rr = np.random.randn(1000,100)
np.savez(fid, data=rr)
fid.close()

よろしくお願い申し上げます

marklescroart avatar Feb 10 '19 04:02 marklescroart

Just wanted to gently nudge again on this issue - I don't mean to pester, I know what it's like to support open source code, I'm just hoping to for an answer about whether this might even be possible. Thanks for your time.

marklescroart avatar Feb 16 '19 00:02 marklescroart

I have no idea to add flush() and tell() with libsmbclient. https://github.com/samba-team/samba/blob/master/source3/include/libsmbclient.h

If I'm in the same situation as you, I'll transfer the complete file written locally to the remote.

hamano avatar Feb 18 '19 02:02 hamano

OK, thanks for the reply. Uploading whole files is a potential solution, but one I am hoping to avoid for a few reasons. The biggest reason is that the arrays I am hoping to load / save are quite large, so saving to disk and then uploading takes substantially longer than directly uploading, and quickly creates a large cache (e.g. on cluster machines running jobs). But I can work with it. Thanks again for your useful code.

marklescroart avatar Feb 18 '19 05:02 marklescroart

The obvious answer is to wrap an smbc.File in your own file-like object. Every time write is called, count up the number of bytes written before passing it on. And of course also keep track of seek calls. Then you can implement tell to return the current position.

As for flush, I assume that libsmbclient doesn’t implement buffering. So either implement flush as a noop, or (more complicated) manage your own buffering.

(Duck typing FTW!)

ldo avatar Sep 21 '19 04:09 ldo

Actually, let me amend that. It should be easier to implement a tell method than I said. Just call lseek with whence = SEEK_CUR and an offset of 0 bytes; this will return the current offset without changing it.

ldo avatar Sep 21 '19 09:09 ldo

Workaround:

class SmbcFileWrapper(object):
    def __init__(self, obj):
        self.obj = obj
    def __getattr__(self, name):
        return getattr(self.obj, name)
    def flush(self):
        pass
    def tell(self):
        return self.obj.seek(0, os.SEEK_CUR)

frafra avatar May 15 '20 18:05 frafra

I added a custom read function to fix another issue: https://github.com/hamano/pysmbc/issues/46#issuecomment-629460520

frafra avatar May 15 '20 20:05 frafra

I did 2 pull requests (flush, tell) + a fix for the read function.

Here is a branch with the 3 PR merged together: https://github.com/frafra/pysmbc/tree/nina-fixes

frafra avatar May 16 '20 15:05 frafra

Upstream bug report for tell: https://bugzilla.samba.org/show_bug.cgi?id=14383 Upstream bug report for flush: https://bugzilla.samba.org/show_bug.cgi?id=14384

frafra avatar May 16 '20 16:05 frafra