Clustering

For complex topics that regular users would not be interested in. For power users and database administrators.
Post Reply
User avatar
jordansparks
Site Admin
Posts: 5739
Joined: Sun Jun 17, 2007 3:59 pm
Location: Salem, Oregon
Contact:

Clustering

Post by jordansparks » Tue Jun 19, 2007 7:48 pm

Someone has suggested that OD could be better optimized for clustering by having smaller tables. They also said clustering would help with multi-office scenarios. Both statements are false. Unless the people involved with this particular cluster setup can possibly clarify where these comments might be coming from. I think that there was some miscommunication somewhere along the line.
Jordan Sparks, DMD
http://www.opendental.com

cybertechcafe
Posts: 6
Joined: Tue Jun 19, 2007 2:54 pm
Location: Cartersville, GA
Contact:

Clustering

Post by cybertechcafe » Wed Jun 20, 2007 6:36 am

I agree, I think that there may be a bit of a miscommunication. First, a little about us and some info regarding the setup in question. We're actually a technology support company for a dentist that has recently migrated from Dentrix to Open Dental. We've tried to get up to par on the terminology specific to dentistry but we're still learning.

- We're using CentOS v4.4 as the OS.
- We're currently using the recommended 'root' + 'nopass' authentication
- We're using MySQL Server v5.1.18 on two servers
- We're using ndb to setup a failover cluster between the two database servers
- We've got the [ndb] management console setup on a third [file] server
- We're using heartbeat to provide a logical IP address to access SQL that's not dependent on either specific server being up.
- All of the files (opendentaldata, etc.) are being served from a file server running Linux (CentOS v4.4) and Samba v3.0.10
- Samba is using OpenLDAP v2.2.13 as it's authentication backend

With the exception of when the [rsync] backups are running, server load averages across all of the servers is between .07 and .47.

This setup is different from the pre-cluster setup in that we're using two different boxes as the database servers (two rather high-end workstation class machines) and we're using LDAP as a backend for Samba (this is split across two Samba boxes to provide redundancy on the Samba side) instead of tdbsam (LDAP is proving to be *much* quicker).

Now, to answer the questions:

With regard to smaller tables, to get the OpenDental database up and going on the cluster rather than the stand-alone database, that's sortof true but not really the whole picture. We had to change the table type from MyISAM (standard MySQL table type) to ndbcluster. To accommodate the ndbcluster table type memory restrictions, we had to cut down on the memory footprint of the database. We did this by reducing the field sizes on some of the fields (e.g., zip code field from 100 char to 15, first name, last name and fields from 100 char to 75, middle initial from 100 to 25, etc.).

With regard to clustering helping with multi-office scenarios, I don't know that this [clustering] would actually help. There is still a lot of data moving between the client and the server (and, in a cluster environment, between the nodes). I don't know that a WAN connection, especially an asynchronous connection like ADSL or Cable, would be able to support the load. We have had success setting up an off-site backup of the database with a master / slave setup (master in the office, slave off-site replicating from the master).
The box said "Requires Windows 2000 or better", so I installed Linux.

User avatar
jordansparks
Site Admin
Posts: 5739
Joined: Sun Jun 17, 2007 3:59 pm
Location: Salem, Oregon
Contact:

Post by jordansparks » Wed Jun 20, 2007 6:50 am

Reducing field sizes will reduce the memory footprint? I'm not familiar with the ndbcluster table type, so I'm basing the following comments on how the MyISAM tables work. Changing the allowed length on varchar fields should have no effect at all on the diskspace because they only take up exactly as much space as they need. The first byte gives the length, and then the data follows. Maybe it works differently in a cluster. I'm surprised that such a minor change would have any effect on the situation. Thanks for the clarification.
Jordan Sparks, DMD
http://www.opendental.com

cybertechcafe
Posts: 6
Joined: Tue Jun 19, 2007 2:54 pm
Location: Cartersville, GA
Contact:

Memory requirements for MySQL databases

Post by cybertechcafe » Wed Jun 20, 2007 3:29 pm

Ok, to clarify the memory and hard drive situation:

1: Jordan, you are right in that diskspace is unchanged based on the field definitions in the tables in the MyISAM database: ie...
varchar[255] == varchar[15] where varchar[] = "HELLO"
because varchar[chars] allocates only what it needs to disk

This is untrue however in the case of:
char[255] == char[15] where char[] = "HELLO" <- FALSE
because char allocates [chars] to disk no matter whats in them

2: In memory however:
varchar[15] == 15x4 bytes == 60 bytes
varchar[255] == 255x4 bytes == 1020 bytes

this is assuming varchar has a size 4 bytes.

now for the effect on a clustered database:

MySQL 5.1 clustering has a limit of 8 KB for each record.
The record's size is defined by the table header.
To keep clustering realistic in speed, it is stored 100% in memory while in use.
Changes are also stored locally on the nodes hard drive so as not to lose state on safe shutdown or restart.

This is in contrast to the MyISAM database that you normally work with in which when you call the table in question it is the only table brought into memory at the time so that the memory constraint are not near as important.

Now with the right configurations, because the ndbcluster database is kept in memory it has the potential to be much faster than the disk based MyISAM database.
The box said "Requires Windows 2000 or better", so I installed Linux.

User avatar
jordansparks
Site Admin
Posts: 5739
Joined: Sun Jun 17, 2007 3:59 pm
Location: Salem, Oregon
Contact:

Post by jordansparks » Wed Jun 20, 2007 6:09 pm

After you pointed me in the right direction, I read the MySQL manual for that section, and now I understand. A simple script could shorten a number of columns. And we'll keep in mind the 120 col limit because we're over half way there on the patient table.
Jordan Sparks, DMD
http://www.opendental.com

enamelrod
Posts: 462
Joined: Tue Jul 24, 2007 9:51 am

missing post

Post by enamelrod » Fri Jul 27, 2007 7:44 am

im assuming this post just was erased but i was wondering after reading this post are the size changes something you would be willing to consider and the option of clustering a possibility?

cybertechcafe
Posts: 6
Joined: Tue Jun 19, 2007 2:54 pm
Location: Cartersville, GA
Contact:

Post by cybertechcafe » Fri Jul 27, 2007 7:56 am

I noticed that some of the posts had been erased as well but was waiting for someone else to revive it. I realize that the clustering probably isn't something that *everyone* would benefit from but I believe that there would be several that would be interested [in clustering] if the upgrades were easier to apply after a database had been clustered. Basically, all that has to change is:

Table Type : Change the table type from MyISAM to NDBCLUSTER
Field Length : Trim down the field lengths so that the entire database can fit into RAM. This is the tricky part because a person has to look at each of the table definitions and determine where the bloat is. For example, setting the field for zip code from 100 characters to 10 or 11 characters.

Again, I would be happy to provide documentation on the specifics of what we changed and even SQL alter statements if there was some assurance that those changes could be integrated into an upcoming release.
The box said "Requires Windows 2000 or better", so I installed Linux.

User avatar
jordansparks
Site Admin
Posts: 5739
Joined: Sun Jun 17, 2007 3:59 pm
Location: Salem, Oregon
Contact:

Post by jordansparks » Fri Jul 27, 2007 8:01 am

I don't think such a script should be included in the core program necessarily. It's just a series of SQL statements that could just as easily reside in a text file. I'm having trouble envisioning where, exactly, we might put such a script in the user interface.
Jordan Sparks, DMD
http://www.opendental.com

cybertechcafe
Posts: 6
Joined: Tue Jun 19, 2007 2:54 pm
Location: Cartersville, GA
Contact:

Post by cybertechcafe » Fri Jul 27, 2007 8:07 am

The script wouldn't have to be included, all we're doing is trimming down some of the database field definitions. Once the script (a series of SQL alter statements) was ran, unless someone goes behind it and overwrites it to change the field definitions back, it doesn't have to be included / re-ran ever again, you just have to make sure that future revisions don't push the database size over the available RAM on the SQL Server(s).

As noted above, the required changes are:
Change Table Type Definition - the default table type is 'MyISAM'. In order for the database to 'live' on a cluster, the table type has to be 'NDBCLUSTER'. This is set for each table in the database and can be easily changed with a search and replace.
Change Field Sizes - many of the field sizes are far larger than they need to be (this is my opinion). Some examples are:

account table:
banknumber, a 255 character varchar field

carrier table:
CarrierName, a 255 character varchar field
Address, a 255 character varchar field
Address2, a 255 character varchar field
City, a 255 character varchar field
State, a 255 character varchar field
Zip, a 255 character varchar field

clinic table:
Address, a 255 character varchar field
Address2, a 255 character varchar field
City, a 255 character varchar field
State, a 255 character varchar field
Zip, a 255 character varchar field
Phone, a 255 character varchar field
BankNumber, a 255 character varchar field

computer table:
PrinterName, a 255 character varchar field

contact table:
LName, a 255 character varchar field
FName, a 255 character varchar field
WkPhone, a 255 character varchar field
Fax, a 255 character varchar field

employee table:
LName, a 255 character varchar field
FName, a 255 character varchar field
MiddleI, a 255 character varchar field

employer table:
EmpName, a 255 character varchar field
Address, a 255 character varchar field
Address2, a 255 character varchar field
City, a 255 character varchar field
State, a 255 character varchar field
Zip, a 255 character varchar field
Phone, a 255 character varchar field

And the list goes on and on, What we've been doing is trimming these down. We looked and were unable to find any zip codes that were 255 characters long, so we trimmed that down. We were also not able to find any 255 character city or state names. The names, we trimmed them down to 50 characters, just to be safe. That MiddleI field, 2 characters. The net result is a much smaller database that will easily fit into memory.

The first change, changing from MyISAM to NDBCLUSTER is one that will need to be done at the time of the update and will depend on whether or not the client is using a stand-alone or cluster environment, kindof like choosing MySQL or Oracle as their database backend, but with fewer required changes. The second could be set once and as long as no future updates change it, wouldn't have to be touched again.
The box said "Requires Windows 2000 or better", so I installed Linux.

User avatar
jordansparks
Site Admin
Posts: 5739
Joined: Sun Jun 17, 2007 3:59 pm
Location: Salem, Oregon
Contact:

Post by jordansparks » Fri Jul 27, 2007 9:01 pm

When I wrote Free Dental version 1.0, I used what I thought were reasonable field lengths. I was proven wrong many many times. That's why all the fields keep ending up bigger. Keep in mind that, for ISAM tables, the field length is nearly meaningless, having zero performance penalty. Even as recently as two months ago, we had to make major changes to the lengths of many fields, changing mediumint fields to int fields. Clustering is the first issue that has come up that requires shorter fields. That's fine. My hope is that someone comes up with a better cluster table that has dynamic field lengths. But until then, the simple solution is to just shorten the ones that are too long. It is very unlikely that we would ever significantly change very many of these field lengths. They are pretty safe once you set them.
Jordan Sparks, DMD
http://www.opendental.com

Post Reply