anonimongo
anonimongo copied to clipboard
GridFS testing issues
As promised, I will begin this thread as I try to debug the issues in the GridFS test. I'm also testing mongodb v 7.0. So far CRUD operations work fine without enabling full v 7.0 compatibility. I won't upgrade fully to 7.0 until we've run over these issues! Also, tests are still not fully operational in mongoo v 7.0. So here we go!
compiler: nim v 2.0
Config:
switch("path", "../src")
switch("define", "filename=/home/testing/shared/anonimongo/tests/15MBfile.bin")
switch("define", "saveas=/home/testing/shared/anonimongo/tests/15MBfile-copy.bin")
switch("define", "user=")
switch("define", "testReplication=yes")
switch("define", "testChangeStreams=yes")
hint("ConvFromXToItSelfNotNeeded", off)
switch("define", "exe=mongod")
switch("define", "dbpath=/var/lib/mongodb")
switch("define", "uri")
switch("define", "nomongod")
--verbosity:0
--define:release
switch("define", "asyncBackend=chronos")
First, I tried uploading a small file around 1 kilobytes, which gave me an IndexDefect error, as part of the tests use 5 megabytes as part of the overall test.
See lines 121-122:
let fivemb = 5.megabytes
f.setFilePos fivemb
So, I used a 15.2 MB file. The first error showed the following:
/home/testing/shared/anonimongo/tests/test_gridfs_test.nim(109, 26): Check failed: f.getFileSize == gf.fileSize f.getFileSize was 1966080 gf.fileSize was 15152802
Inspecting the resulting file saveas
I verified it had only downloaded 2 MB, confirmed by the following error:
/home/testing/shared/anonimongo/tests/test_gridfs_test.nim(94, 8): Check failed: wr.success Grid download file: reason = Incomplete file download; only at 12.97502600509134% /home/testing/shared/anonimongo/tests/test_gridfs_test.nim(100, 8): Check failed: wr.success Grid download as: reason = Incomplete file download; only at 12.97502600509134%
I checked with pymongo and python's gridfs driver and got the following:
gridfs.errors.CorruptGridFile: truncated chunk #0: expected chunk length to be 1048576 but found chunk with length 131072
Suggesting this was happening from the get go, when uploading.
I tried uploading a 80 MB file, only 32MB got uploaded, but no error arised.
I began monitoring line 122 in gridfs.nim
:
let data = waitfor f.read(chunksize) # to make it work regardless sync/async because asyncfile
echo data.len # <- my addition
which output
131072 131072 131072 ... 131072
Even when hardcoding chunks to 261120, the default for GridFS according to MongoDB documentation, it did not work. When I deliberately switched the test to:
test "Upload file":
when anoSocketSync:
wr = grid.uploadFile(filename, chunk = 131072)
else:
wr = waitfor grid.uploadFile(filename, chunk = 131072)
wr.success.reasonedCheck("Grid upload file", wr.reason)
The test ran perfectly. Analyzing further, I hardcoded 1048576
in line 122 in gridfs.nim
just to test what was happening:
let data = waitfor f.read(1048576) # to make it work regardless sync/async because asyncfile
echo "chunk: " & $chunkSize
echo data.len
this was the output:
chunk: 1048576 131072 chunk: 1048576
Then I hardcoded this in line 122:
let data = waitfor f.read(261120) # to make it work regardless sync/async because asyncfile
echo "chunk: " & $chunkSize
echo data.len
I got the following for data.len:
131072
edit May this be a bug in the nim std library? or a protection against buffer overflows? It's not, works as expected, see comment below
While I have more time to devote to this error. I'm just pointing out this change I made that made the test work:
test "Create default bucket":
require db != nil
when anoSocketSync:
grid = db.createBucket(chunkSize = 131072)
else:
grid = waitfor db.createBucket(chunkSize = 131072)
require grid != nil
edit I think this may be a bug in the std library. It's not, see comment below.
We can run more tests. I will continue through this week to find a solution to the issue. There other issues when uploading other files. I tried uploading a .deb file just for kicks. It uploads alright, but overflows in the remaining tests, and fails. I can download it from gridFS using Python without issues, but there's a point in the test this breaks. Once we figure out what's happening with the 131072 value, we can move with the rest of the tests!
This is just one step at debugging this, and I hope it helps.
Testing async file on its own I had the following:
import std/[asyncfile, asyncdispatch, os]
const target = "path/files"
var f = openAsync(target, fmRead)
let
fsize = getFileSize f
chunkSize = 261120
for _ in countup(0, int(fsize-1), chunkSize):
let data = waitfor f.read(chunkSize) # to make it work regardless sync/async because asyncfile
echo data.len
the output:
261120 261120 261120 ... 261120 261120 65985
Switching chunkSize to a megabyte (1048576):
1048576 1048576 1048576 ... 204225
Works as expected.
The following code, run independently from the test works fine:
import std/[asyncfile, asyncdispatch, os]
import anonimongo
const
anoSocketSync* = defined(anoSocketSync)
URI = "mongodb://localhost:27017/"
saveas = "/home/testing/mongodb-mongosh_1.6.0_amd64.deb"
target = "/home/testing/Downloads/mongodb-mongosh_1.6.0_amd64.deb"
when not anoSocketSync:
type TheSock* = AsyncSocket
else:
type TheSock* = Socket
var mongo = newMongo[AsyncSocket](MongoUri URI)
var grid: GridFS[TheSock]
var db: Database[TheSock]
var wr: WriteResult
let dbname = "newtemptest"
var f = openAsync(target, fmRead)
let
fsize = getFileSize f
chunkSize = 1048576
for _ in countup(0, int(fsize-1), chunkSize):
let data = waitfor f.read(chunkSize) # to make it work regardless sync/async because asyncfile
echo data.len
assert waitFor mongo.connect
db = mongo[dbname]
grid = waitfor db.createBucket(chunkSize = 1.megabytes.int32)
wr = waitfor grid.uploadFile(target)
echo wr.success
wr = waitfor grid.downloadFile(saveas)
echo wr.success
Is there an enforced limit on buffer size in the tests? Checking this out next
So, I finally made it work. However, there are a few limitations to the test.
First, I changed a few check
macros to doassert
; Secondly, I changed all streams above 131072 to 131072. There appears to be a buffer limitation inside the test
template. I cannot confirm it yet, but all tests outside it work fine.
Considering the above limitations, I'd like to propose tests outside the test
template to run streams above 131072, such as the 5.megabyte stream that is currently in the gridFS test right now.
I'm attaching a diff file with the changes I made to make the test work, pending all of y'alls review: gridfs.diff.zip
In addition, I'm sharing the diffs here to improve readability and searchability:
diff --git a/tests/test_gridfs_test.nim b/tests/test_gridfs_test.nim
index 112903b..53db33f 100644
--- a/tests/test_gridfs_test.nim
+++ b/tests/test_gridfs_test.nim
@@ -67,9 +67,9 @@ if filename != "" and saveas != "":
test "Create default bucket":
require db != nil
when anoSocketSync:
- grid = db.createBucket(chunkSize = 1.megabytes.int32)
+ grid = db.createBucket(chunkSize = 131072)
else:
- grid = waitfor db.createBucket(chunkSize = 1.megabytes.int32)
+ grid = waitfor db.createBucket(chunkSize = 131072)
require grid != nil
test "Upload file":
@@ -106,8 +106,9 @@ if filename != "" and saveas != "":
var gf = grid.getStream(bson({filename: dwfile}), buffered = true)
else:
var gf = waitfor grid.getStream(bson({filename: dwfile}), buffered = true)
- check f.getFileSize == gf.fileSize
- check f.getFilePos == gf.getPosition
+ doassert f.getFileSize == gf.fileSize
+ doassert f.getFilePos == gf.getPosition
+
let threekb = 3.kilobytes
when anoSocketSync:
@@ -115,28 +116,32 @@ if filename != "" and saveas != "":
else:
var binread = waitfor gf.read(threekb)
var bufread = waitfor f.read(threekb)
- check bufread == binread
- check f.getFilePos == gf.getPosition
+ doassert bufread == binread
+ doassert f.getFilePos == gf.getPosition
+
- let fivemb = 5.megabytes
- f.setFilePos fivemb
+ let kbStream = 131072 #changed
+ f.setFilePos kbStream
when anoSocketSync:
- gf.setPosition fivemb
+ gf.setPosition kbStream
else:
- waitfor gf.setPosition fivemb
- check f.getFilePos == gf.getPosition
-
+ waitfor gf.setPosition kbStream
+ doassert f.getFilePos == gf.getPosition
+
+
when anoSocketSync:
- binread = gf.read(fivemb)
+ #it can't be fivemb because of the buffer limit in tests
+ binread = gf.read(kbStream)
else:
- binread = waitfor gf.read(fivemb)
- bufread = waitfor f.read(fivemb)
- check bufread.len == binread.len
- check bufread == binread
- check f.getFilePos == gf.getPosition
+ binread = waitfor gf.read(kbStream)
+ bufread = waitfor f.read(kbStream)
+ doassert bufread.len == binread.len
+ doassert bufread == binread
+ doassert f.getFilePos == gf.getPosition
close f
close gf
+
test "Gridstream read chunked size":
let chunkfile = "gs_chunks.mkv"
@@ -246,4 +251,4 @@ if filename != "" and saveas != "":
close mongo
if runlocal:
if mongorun != nil and mongorun.running: kill mongorun
- close mongorun
\ No newline at end of file
+ close mongorun
Considerations:
- Make a test without the
test
template with larger streams. According to the comment above, gridfs is working properly. I do, however, think we should devise a strategy to run these tests outside thetest
template. However, we should keep the current test, and include these changes if approved. - I'm adding this as a diff file for you to review, and once changes are analyzed and collectively approved I can put out a pull request with the changes.
- the diff file has a small change in a letter inside a variable called
chunksize
and it was being calledchunkSize
. So to standardize an improve readability I changed all tochunksize
.
@mashingan I've tested this with mongodb v7.0... all tests pass in debian. I will begin testing on a few applications. But I think anonimongo is fully compatible with 7.0. Are you ok with me pushing a branch with the changes I made to the GridFS test or would you like to review the diff file first?
Please go ahead to proceed. Appreciated for the efforts.