docs Debug TiDB Cloud Documentation: Import Sample Data to TiDB Cloud

This issue is a sub-issue of Debug TiDB Cloud Documentation: Summary Issue · Issue #15480 · pingcap/docs. The purpose of this sub-issue is to verify and debug the Import Sample Data to TiDB Cloud document.

You can follow the instructions provided in #15480 to verify and debug the instructions in this document.

After finishing your verification, please add your verification result to this sub-issue as a comment. The result can be the issues you encounter, the mistakes you find, or any other findings. If everything looks fine, you can also add it as a comment.
For any issues you found during the verification, welcome to create a pull request (PR) to fix them directly. In the pull request, please indicate which issue this PR resolves in the PR description (for example, fix #15740). To learn how to create a pull request, see TiDB Documentation Contributing Guide.

Note: Currently, the TiDB Cloud documentation is in English only and it is stored in the release-7.5 branch of pingcap/docs for reusing the SQL documentation of TiDB. Hence, to create a pull request for TiDB Cloud documentation, make sure that your PR is based on the release-7.5 branch.

Your contribution to testing and verifying the documentation is highly appreciated!

Dec 16 '23 03:12 qiancai

/assign

Jan 09 '24 22:01 minaelee

General notes: My main impression is that this document seems to have an identity crisis - is it a tutorial, or is it a reference?

In some places it seems to be a tutorial, such as where it instructs the reader to use the sample data and sample Bucket URI and so on. It offers a hands-on experience to achieve a pre-set goal without trying to explore every possible option.

In other places, it seems to be to be a reference that provides details about every aspect of a topic, such as where it tells the reader all about importing into pre-created tables or importing from AS3/GCS, despite the sample data only coming from AWS.

I would strongly consider splitting this into two documents:

Make the original document 'Import Sample Data (SQL File)' into a reference, more along the lines of the other documents in this group that start with "Import..." that consistently gives all the details for each option, instead of skipping some and explaining others. I would rename this file 'Import SQL from Amazon S3 or GCS', to go along with the other documents in its group, which are named 'Import CSV File from Amazon S3 or GCS' and 'Import Apache Parquet Files from Amazon S3 or GCS' in the navigation sidebar.
A tutorial, explicitly referred to as a tutorial, titled 'Try Out SQL Import' (along the lines of the more general 'Try Out' guides in the 'Getting Started' section), added as a subdocument under 'Import SQL from Amazon S3 or GCS'. Perhaps more tutorials could be added for the other import options as well—in which case, I would advise moving the tutorials to their own named section under the 'Import Data' section.

If this document is not split into two, then I have additional comments regarding making the single document more consistently one way or the other.

Jan 10 '24 01:01 minaelee

In Step 2, this block of text seems unnecessary:

Data format: select SQL File. TiDB Cloud supports importing compressed files in the following formats: .gzip, .gz, .zstd, .zst and .snappy. If you want to import compressed SQL files, name the files in the ${db_name}.${table_name}.${suffix}.sql.${compress} format, in which ${suffix} is optional and can be any integer such as '000001'. For example, if you want to import the trips.000001.sql.gz file to the bikeshare.trips table, you can rename the file as bikeshare.trips.000001.sql.gz. Note that you only need to compress the data files, not the database or table schema files. The Snappy compressed file must be in the official Snappy format. Other variants of Snappy compression are not supported.

The information is repeated in the import UI, as well as in the Naming Conventions for Data Import Page that's linked from the UI. That's 4 places at least with the same information repeated, meaning 4 potential separate update points.

I suggest removing the block entirely (everything after Data format: select SQL File), and potentially leave a link to the Naming Conventions for Data Import page, i.e.:

Data format: select SQL File. For information about naming conventions, see Naming conventions for data import.

Jan 10 '24 01:01 minaelee

At the end of Part 2:

If the region of the bucket is different from your cluster, confirm the compliance of cross region. Click Next.

The "Click Next" directive is confusing for those whose region is not different from their cluster, and thus will not see a Next button to click. It was confusing to me until I realized that it was not a standalone command but connected to the previous one. Suggest connecting the two sentences: If the region of the bucket is different from your cluster, confirm the compliance of the cross region, then click Next."

Jan 10 '24 01:01 minaelee

In Step 3, it should be more clear to the user that if using the sample data, they should choose the import from S3 option. This is an example of where the document does not know whether it's a tutorial or reference. Here, it acts like a reference, saying:

You can choose to import into pre-created tables, or import schema and data from the source.

Then giving detailed information about each. No explicit instruction is given to someone who is following along with the sample data as to which one to choose.

Also in Step 3:

When the data import progress shows Completed, you have successfully imported the sample data and the database schema to your database in TiDB Cloud. Once the cluster finishes the data importing process, you will get the sample data in your database.

Additional direction could be useful here. Tell the user that after the Import Task window shows that the task is completed, click the "Explore your data by Chat2Query" button to run test queries in the terminal.

Jan 10 '24 02:01 minaelee

@minaelee thanks for your contributions!

Jan 10 '24 05:01 rpaik

I agree that splitting this into a reference and tutorial would be good.

Jan 10 '24 07:01 dveeden

My main impression is that this document seems to have an identity crisis - is it a tutorial, or is it a reference?

I agree. In fact, this problem also exists in a large number of other documents.

You have done a great job, and all of the suggestions are very valuable and practical. Thank you very much. As a developer working on related features, I support all of your suggestions.👍 @minaelee

Jan 10 '24 10:01 okJiang

Thank you for your suggestions. Your suggestions are very specific and targeted. We will make improvements both in the UI and documentation based on your feedback. @minaelee

Jan 10 '24 11:01 Frank945946

Hi @minaelee , Thank you sincerely for your valuable feedback! We are truly impressed by your technical writing expertise.

I wholeheartedly agree with all of your insightful suggestions. Would you be so kind as to create a pull request (PR) to update the documentation as per your recommendations?

Your contributions are greatly appreciated, and we look forward to implementing your enhancements.

Jan 10 '24 12:01 hfxsd

docs docs copied to clipboard

Debug TiDB Cloud Documentation: Import Sample Data to TiDB Cloud

docs
docs copied to clipboard