basex icon indicating copy to clipboard operation
basex copied to clipboard

Increase BaseX storage limits: namespaces, number of nodes

Open gimsieke opened this issue 10 years ago • 17 comments

in 7.8, it’s still 256

gimsieke avatar Mar 21 '14 06:03 gimsieke

Requires a new storage layout. Will probably be aligned with a higher node id limit (which would also fix #676).

ChristianGruen avatar May 17 '14 22:05 ChristianGruen

See also #1193

ChristianGruen avatar Sep 17 '15 07:09 ChristianGruen

subscribe

innovimax avatar Sep 19 '15 16:09 innovimax

https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/Data.java#L27

    • -
    • The table is limited to 2^31 entries (pre values are signed int's)
    • -
    • A maximum of 2^15 different element and attribute names is allowed
    • -
    • A maximum of 2^8 different namespaces is allowed
    • -

innovimax avatar Sep 19 '15 16:09 innovimax

We also ran into the nodes limit today; the corresponding error message is 'Insertion at beginning of populated table.'. This is pretty bad: the system is running faithfully for years, and then all of a sudden it's 'game over'.

I understand that limits must exist, and Integer.MAX_VALUE is of course a typical limit in Java. However, it would be great if the existing limits could be documented somewhere more prominently. Also, I feel that all three limits mentioned in innovimax's last comment are likely to be exceeded by standard real-world applications nowadays. New storage layout is a big task, but I fear it will be needed.

malamut2 avatar Oct 22 '15 20:10 malamut2

Hi,

we ran into the same problem loading all Dutch legislation into BaseX. Although partitioning the dataset is an option, I think BaseX should be able to handle a real-world use-case like this, without users having to 'work around' it's limitations.

hhv avatar Mar 31 '16 05:03 hhv

What about an HDFS implementation of BaseX Database? To distribute the load of data across HDFS nodes? Just a thought.

cfoster avatar Apr 19 '16 15:04 cfoster

What about an HDFS implementation of BaseX Database?

Late reply, but better than none: The PAXQuery engine (Homepage, Paper) is worth to be mentioned. It utilized BaseX to speed up queries via HDFS and Apache Flink.

ChristianGruen avatar Jun 08 '16 11:06 ChristianGruen

Will this be fixed in 9.0?

gimsieke avatar Feb 01 '18 08:02 gimsieke

As this would require a completely new storage layout, it would be quite a breaking change. However, it might get an option if we find a potent sponsor.

ChristianGruen avatar Feb 01 '18 08:02 ChristianGruen

While I probably cannot convince my fellow managing directors to fund this single issue (so that I can finally index all the XSLT/XProc code and all other XML files on my hard disk), I will suggest that we make a lump sum donation that you might use for stuff like this.

We’ve been contemplating adding some “Github issue crowdfunding functionality” to our transpect repos, for issues that don’t have priority for us to fix but where users can collectively fund fixes. We’ve been (very briefly) looking at https://freedomsponsors.org/⁠. Maybe this is interesting for BaseX, too.

I suggest that we discuss it privately or open another issue for this and solicit user feedback on the mailing list.

gimsieke avatar Feb 01 '18 09:02 gimsieke

Thanks for the link to freedom sponsors, could be interesting indeed!

ChristianGruen avatar Feb 01 '18 09:02 ChristianGruen

We apparently ran into the limit with BaseX 9.4.5. How can we check if the limit has been reached? We are inserting data and consistently get this error (with slightly different numeric values) when the process reaches a certain point:

java.lang.RuntimeException: Data Access out of bounds:

  • pre value: 2147479679
  • table size: -2147479197
  • first/next pre value: -2147479425/-2147479197
  • #total/used pages: 8388848/8388848
  • accessed page: 8388847 (8388848 > 8388847] at org.basex.util.Util.notExpected(Util.java:61) at org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:474) at org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:158) at org.basex.data.Data.kind(Data.java:312) ...

kgaleazzi avatar Dec 22 '20 22:12 kgaleazzi

@kgaleazzi It seems we need to add some more limit checks. How do you insert your data?

ChristianGruen avatar Jan 04 '21 08:01 ChristianGruen

We use 'insert node as last' as we rely on the node order with our application. Please advise if there are better ways to insert data and avoid the out of bounds exceptions.

kgaleazzi avatar Jan 04 '21 15:01 kgaleazzi

In the long term, insertions should be rejected by BaseX if the database are exceeded.

One manual way to avoid out of bounds exceptions is to check the current number of nodes before inserting data via XQuery:

declare variable $LIMIT := 2000000000;

let $size := db:property($db, 'size')
return if ($size > $LIMIT) then (
  error((), 'Database node limit is reached.')
) else (
  insert ...
)

ChristianGruen avatar Jan 07 '21 12:01 ChristianGruen

That helps, thank you. So the limit accounts for an estimate of the maximum size of the data being inserted (2^31 - $LIMIT).

kgaleazzi avatar Jan 07 '21 19:01 kgaleazzi