[Feature]: Benchmarks for DSL Documentation
Problem Description
I'm not sure if this is the right spot, but let's try this:
The Benchmark for SharpMUSH, which is a service interacted with via a Domain Specific Language (DSL), proposes LLM generated questions that target the language the service is written in.
The documentation itself does not mention C# anywhere, however, I will assume the LLM did research outside of this guide and found that the Server itself, which provides its DSL via ANTLR4 parsing, is written in C#.
As such, the Benchmark questions generated were not relevant, and the score it created as such was lower than it should have been.
Proposed Solution
Ensure that there's a rule for the LLM creating benchmarks - that considers if the targeted documentation is for a DSL, or for the actual language most prominent on a related Github Repository.
Alternatives Considered
No response
Priority
Nice to have
Additional Context
https://context7.com/sharpmush/sharpmush.guide?view=benchmark
Thank you so much for reaching out @HarryCordewener
We are looking into this
Do you have benchmark question suggestions for us?
Do you have benchmark question suggestions for us?
It is a bit difficult to do this in a more 'general' way for DSLs, but maybe you will pick up on a pattern here. Here are what I would have 'reduced' the original questions to:
Question 2
ORIGINAL: Write the C# code to programmatically create a new Room object in SharpMUSH. The room should have the name 'The Void' and a description 'An endless, dark expanse.' After creation, create an Exit object linking this new room to the game's starting room (dbref #0).
NEW: Write the code to programmatically create a new Room object in SharpMUSH. The room should have the name 'The Void' and a description 'An endless, dark expanse.' After creation, create an Exit object linking this new room to the game's starting room (dbref #0).
NEW (ALT): Write the SharpMUSH code to programmatically create a new Room object in SharpMUSH. The room should have the name 'The Void' and a description 'An endless, dark expanse.' After creation, create an Exit object linking this new room to the game's starting room (dbref #0).
Question 6
ORIGINAL: Create a simple 'heartbeat' or 'tick' service using SharpMUSH's dependency injection and background task system. This service should execute a piece of logic (e.g., printing a message to the log) every 30 seconds without blocking the main game loop.
NEW: Create a simple 'heartbeat' or 'tick' service using SharpMUSH's codebase. This service should execute a piece of logic (e.g., printing a message to the log) every 30 seconds as asynchronously as possible.
Question 8
ORIGINAL: Design and implement a thread-safe locking mechanism for a container object (e.g., a chest). Your solution must prevent race conditions where two players simultaneously attempt to add or remove items, ensuring the container's inventory state remains consistent. Explain how your implementation leverages SharpMUSH's concurrency primitives, if any, versus standard C# locking.
NEW: Design and implement a mechanism for a container object (e.g., a chest). Your solution must prevent race conditions where two players simultaneously attempt to add or remove items, ensuring the container's inventory state remains consistent. Explain how your implementation leverages SharpMUSH's concurrency primitives, if any.
Question 9
ORIGINAL: Describe the process for implementing a custom command parser middleware in SharpMUSH. This middleware should intercept player input before the default parser to handle a new syntax, such as a dice rolling format (e.g.,
roll 2d6+3). Your explanation should cover how to register the middleware, how to conditionally pass control to the next parser in the chain, and how to manage parser priority.
NEW: Describe the process for implementing a custom command in SharpMUSH. This command handle a new syntax, such as a dice rolling format (e.g.,
roll 2d6+3). Your explanation should cover how to register the custom command.
Question 10
ORIGINAL: Propose a strategy for sharding a large game world across multiple SharpMUSH server instances to improve performance and scalability. Detail how you would handle cross-shard object interaction, player movement between shards (server processes), global communication channels, and ensuring data consistency for a shared player account across all shards.
NEW: Propose a strategy for handling a large game world on a SharpMUSH server to improve performance and scalability. Detail how you would handle object interaction, player movement, global communication channels, and ensuring data consistency.
we will enable community submit benchmark questions soon
How're we looking on community submitted benchmark questions? @enesakar
Hey @HarryCordewener we still need some time until we implement this feature but here is a new list of questions after the refresh, what do you think? https://context7.com/sharpmush/sharpmush.guide?tab=benchmark
They look good except for question 9, which still focuses on .net