When starting server side (database core), you can specify arguments to import local CSV file. This way of importing is faster than web client's importer (because there is no networking to slow it down), but can only be performed when starting server side and is limited to import only single CSV file (database with only one table). The reason for this is, that this feature is there for backup purposes only, when web client CSV importer is not available for some reason. So, how to do it? You need to open command line interface (powershell or cmd on Windows) and write the following command:
./dropdbase_instarea path_to_csv_file [database_name_optional] [block_size_optional]
.\dropdbase_instarea.exe path_to_csv_file [database_name_optional] [block_size_optional]
You do not have to specify database name and block size. Database block size will be used as table block size as well. If database name is not specified, the default value 'TestDb' will be used. If block size is not specified, the default value '1048576' will be used.
Use only those columns, that you truly need. It takes more memory and takes more time to execute
SELECT * FROM Table; if a table has many columns than
SELECT ColumnA, ColumnB FROM Table;
If there are string enumerates, that can be internally represented as integer in database, save this enumeration in integer format for much better query performance.
If there are minimum cache misses, the optimal block size (in terms of performance) is the largest one, that can be stored in the single graphic card (read Hardware Limitations - Block Size). Otherwise it depends on each query and data and the only way to find optimal block size is to try different block sizes on the same set of queries and choose the best one. Generally speaking - smaller block sizes improve stability (it is less likely that the query fails due to not having enough GPU VRAM), but sacrifies a bit of speed (we are talking in terms of roughly 5% - 15%). Larger blocks are better for performance, but it is more likely, that the query fails due to not having enough GPU VRAM.
In the main configuration file, there is entry called "GPUCachePercent". Set it's value to the highest possible, but still leave as many GPU VRAM percent as needed for intermediate results for the most difficult query you expect to use. You can calculate this roughly like number of rows in temporary columns (which are created, e.g. when you extract year from datetime, then the year column is Int32 type) times the number of bytes needed for one value. So if we have 1 million rows of data and we create a new column which is Int32 type, then we need for intermediate results 1 000 000 rows x 4 bytes = 3.81 MB of memory space. Aliases do not create new columns, so they do not need extra memory. That being said, usually you need about 500 MB of GPU VRAM per 1 billion rows of data in 7 columns for intermediate results. So we usually the value for GPUCachePercent is in range 80 - 95 (percent).
Better hardware means better performance, see chapter Hardware Impact on Performance to see, what hardware components are the most important in terms of speed.
If you do not want to load a specific database from your databases directory, just change .db extension of the main database file to something else (e.g. .dbx).