Tuesday 23 March 2010

Diagnosing an Ektron eSync Relationship

Ektron’s eSync service is a great product when it’s working but getting the initial configuration right can be a bit fiddly.  I’ve compiled a quick diagnostic procedure to resolve the most common issues, as well as some of the more common issues.
Note: The Ektron Windows Service is usually in: C:\Program Files\Ektron\EktronWindowsService30. After any configuration change restart the Ektron Windows Service on every node in the relationship.
  1. Restart the Ektron Windows Service on every node within the relationship
  2. Ensure that the Ektron license is valid for every database within the relationship
  3. Ensure that you can access http://<DOMAINNAME>/workarea/ServerControlWS.asmx on each node of the relationship.  In load balanced environments ensure that you test each node explicitly.
    1. Check that this is the same as the WSPath in the AppSettings
    2. Check that this is the same in the sitedb.config file in the Ektron Windows Service folder
  4. Ensure that each node is accessible on port 8732 from the next nodes in the chain
    1. From the command line enter ‘telnet servername 8732’ (in Server 2008/Windows 7 you will need to install the telnet client)
    2. If a connection is successful, you fill see a blank screen press ctrl+c to exit and restart the service you just connected to
  5. Ensure that the SearchConfigUI tool can properly iterate through all of the IIS websites (it will group results by unique connectionstrings).
    1. If the SearchConfigUI fails on an IIS 7 box ensure that IIS 6 Management Tools and Metabase compatability have been installed
  6. Ensure that all servers are using the same version of Ektron
    1. Compare the filesize (and versions) of the following files within the Windows Service folder
      • Ektron.ASM.EktronServices30.exe
      • Ektron.ASM.AssetConfig.dll
      • Ektron.FileSync.Common.dll
      • Ektron.FileSync.Framework.dll
      • Ektron.Sync.Communication.dll
      • Ektron.Sync.SyncServices.dll
  7. Ensure that the website is built against the same version of ektron as installed on each of the nodes
    1. Check that the common libraries within the website bin folder and windows service folders match
  8. Check that the website bin directories are the same size on all nodes within the relationship
  9. Check the Windows Service Error Log for indications of a Transport Layer Error (log files are within the /logs folder)
    1. The following errors can indicate a DNS or Proxy issue:
      • The remote server returned an unexpected response: (407) Proxy Authentication Required.
      • The request failed with HTTP status 403: Forbidden
      • The HTTP request was forbidden with client authentication scheme 'Anonymous'
  10. Ensure all nodes of the eSync relationship got all counterpart certificates installed (using Security Configurator)?
    1. The Ektron Windows Service directory will contain all of the *_SyncClient.* files for each node and the *_SyncServer.* files for the local machine
    2. The website root direction should contain the local machines *_SyncClient.* files and the timestamps should match the equivalent certificates in the Windows Service directory
    3. Check that local machines certificates are referenced within the ektron.serviceModel section of the web.config
    4. The Ektron.ASM.EktronServices30.exe.config file within the Windows Service folder should contain an element called publicCertKeys the has a reference to a client certificate for each server in the eSync relationship.  These should be present on every machine in the relationship (they may be in a different order).
    5. The website on each machine should have an AppSetting setting called EncodedValue this should be identical to the value stored within the Service Config file for the same machine.  If not, copy from the Windows Service config into the AppSetting not vice-versa.

Common eSync Errors

(as seen on Synchronisation Progress screen)
Object Reference Not Set To An Instance of An Object
  1. Check that the client certificate is present in the root of the website
  2. Check that local machines certificates are referenced within the ektron.serviceModel section of the web.config
  3. Check the WSPath is correct and acccessble
  4. Ensure the AppPool is runnning under the Network Service identity
The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state
  1. Restart all services
  2. Check that licenses are installed and valid
  3. Check that certificates are installed
  4. Check the WSPath is correct and acccessble
  5. Compare the EncodedValue setting in the web.config with the value stored in the Ektron Service config
SyncDatabaseFailed failed with message: Max index do not match. Local index:X, Remote index:Y
  1. Decide which server has the correct index (ie if one server has other eSync profiles that are working ok)
  2. On the server that has the wrong index open the ServerInfo.xml (within the x:\sync folder)
  3. Change all of the MaxId="X" values to the correct value
  4. Open the Ektron database for the website with the incorrect index and modify the 'share_index' field of the settings table to the correct value
  5. Restart all Ektron Windows Services within the relationship
Invalid method invocation. Remote position is NOT staged.
  1. Stop all Ektron Windows Services within the relationship
  2. Create a copy of the x:\sync folder (where x is the drive letter for your workarea website)
  3. Delete all of the contents of the x:\sync folder
  4. Restart all of the Ektron Windows Services within the relationship
  5. Rerun synchronisation to rebuild relationship
SyncAssetLibraryFailed failed with message: Object reference not set to an instance of an object.
  1. Ensure that the ‘StorageLocation’ attribute of ‘DocumentManagerData’ is set to a unique location for each environment (this is within the AssetManagement.config) (if hosted on the same server)
No CMS400.NET sites were found at this location (when configuring a new relationship)
  1. Ensure that the certificates have been correctly installed on each node of the relationship.
The Remote Position is NOT Staged (when performaning an intitial sync)
  1. If a previous relationship existed between the two servers, ensure that the records relating to them have been removed from the [dbo].[scheduler] database table.
  2. Remove all folders relating to the previous relationship from the x:\sync folder (and in all child folders) 
  3. Try running the sync from the target website
  4. If problem persists, recreate the target min database – a previous initial sync attempt has semi-populated the target database beyond economical repair.
eSync Status is ‘Running’ Even When There Is No Sync In Progress
  1. Ensure that all Ektron services related to the website (development machines, build servers, etc, etc) have exactly the same version.
  2. Manually set the status to ‘completed’ in the [dbo].[scheduler] table and restart the services.
  3. If the problem recurs using SQL Profiler to find which server is updating the status (using the additional  HostName column).  The problem  command starts with ‘exec cms_updateschedulerun
Service Not Listening on port 8732 and EktronL2 Event Log contains ‘Service initialized successfully’ and ‘Service stopped successfully’
  1. Ensure that the workarea can be opened without an error (cmslogin.aspx and /workarea/servercontrolws.asmx)
  2. Restart service
  3. Ensure ‘Service started successfully’ message is recorded
SyncDatabaseFailed failed with message: The message with Action 'http://tempuri.org/ISyncService/InitSession' cannot be processed at the receiver, due to a ContractFilter mismatch at the EndpointDispatcher
  1. Ensure that the same esync service version is installed on all nodes of the relationship
Failed to execute the command 'DeleteCommand' for table 'content_folder_tbl' (or similar)
  1. There’s been an irresolvable conflict, try reversing the conflict policy temporarily and resyncing after a successful sync, the original policy can be restored.
Initial Sync Fails with a Database Constraint Error
The initial sync fails with a constraint error similar to:
  • Violation of PRIMARY KEY constraint 'PK_history_meta_tbl'
  • The ALTER TABLE statement conflicted with the FOREIGN KEY constraint "folder_taxonomy_tbl_fk2"
  • Violation of PRIMARY KEY constraint 'PK_content_meta_tbl'
  1. Remove the damaged sync profile (see below)
  2. Stop users from editing content whilst the initial sync is run
Initial Sync Fails with:  Could not find file ‘….\uploadedFiles\\metaconfig.doc'
  1. Copy ‘metaconfig.doc’ from /assets into the uploadedFiles and uploadedImages
  2. Run SearchConfigUI and rebuild catalogs
  3. Re-run the intitial sync (it should pick up where it left off!).  You shouldn’t need to remove the relationship and rebuild.
Out of Memory Exceptions
  1. Ensure that all servers have adequate memory
  2. Reduce the batch size of the Ektron Windows Service by adding batchsize=100 to the 'DatabaseRuntime’ element within DbSync.config in the Ektron Windows Service folder.
  3. Ensure that all servers in the relationship have the same batchsize configured.
  4. Restart Ektron Service
  5. Rerun sync

The system can not find the file specified

  1. Ensure that the three *_SyncClient.* files the the local machine
  2. Ensure that local machines certificates are referenced within the ektron.serviceModel section of the web.config
  3. If the certificates have been accidently deleted, you can copy them from the Ektron Windows Service folder into the root folder.

Removing a Damaged eSync Relationship

If an initial sync has failed it can leave a partially populated sync relationship that will need to be removed.  Each time this is run the Max Server index will be incremented.
The procedure to do this is:
  1. Stop Ektron Service
  2. Deleting the target database
  3. Remove the corresponding entries from 'scheduler' table in the sending database
  4. Remove entries from 'X:\sync\ServerInfo.xml' on both servers
  5. Remove any folders (for the target environment) in X:\sync folder
  6. Remote entries from sitedb.conf for connections on both servers
  7. Remove contents of 'assets' and 'privateAssets' on target server
  8. Reinstall min database (using Site Setup)
  9. Recreate indexies using SearchConfigUI on target server
  10. Configure and run initial sync

29 comments:

  1. Very sweet post, I thought we were the only ones that had the "SyncDatabaseFailed failed with message: Max index do not match. Local index:X, Remote index:Y" and knew how to fix it. Why doesn't Ektron document this kind of stuff.

    ReplyDelete
  2. Added a fix for 'Invalid method invocation. Remote position is NOT staged.'

    ReplyDelete
  3. Added fix for 'SyncAssetLibraryFailed failed with message: Object reference not set to an instance of an object.'

    ReplyDelete
  4. Very usefull post. I am working with Ektron CMS for more than 2 years and you are the first person I found who blog about Ektron stuff (other than Ektron staff). Keep up the good work!!!

    ReplyDelete
  5. By the way, do you know how that problem can be solved : my synchronization (with eSync) is working well if I start it locally from the server but if I log in the workarea remotely (that mean from desktop machine) and start the synchronization, I get the following error message : The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state

    Thank you

    ReplyDelete
  6. Hi Patrick,

    Is the domain name you use locally and remotely the same? I believe if it's different (say localhost and local.project.dev) you'll get a licensing error.

    There may be additional clues in the Ektron Windows Service logs.

    ReplyDelete
  7. Yes they are different but I had a licence for both in my CMS (one is the licence we purchased and the other one is a trial licence I requested for testing that case).

    Unfortunately the Ektron Window Service logs dont have any entry really usefull.

    I am dealing with Ektron support for 2 days now, hopefully they will find the solution soon.

    ReplyDelete
  8. Can you modify the host file of your remote machine so that you use the same domain as you would locally?

    eSync is very picky about changes to domain/paths/databases.

    ReplyDelete
  9. Hi Martin,

    Im not sure if this is what you mean but I mapped in my remote machine host file the server machine name to the internal IP address of the server.

    127.0.0.1 localhost
    172.X.X.X servermachinename
    .....

    When I do the synchronization locally on the server, I often access the workarea with localhost or the server machine name. As for CMS licence on the server I added one with the machine name as url. As for localhost I think we dont need any licence.

    But I still get the same error message.

    ReplyDelete
  10. Hmm... I don't really know what to suggest. The faulted state error message isn't particularly useful as all it's telling you is an exception was thrown and now WCF is busted.

    If you use the same domain to login locally on the server as well as remotely (ie http:///) it should behave the same way regardless of where you login from.

    It probably worth checking that the WSPath in the appSettings uses the same domain name (and the url works).

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Hi Martin,

    Im not sure if thats what you meant in your previous post, but the solution to solve my problem has been to modify the host file of the SERVER.

    The thing is that I was calling the website from my remote machine with the address (for exemple) "http://test.com" but the server could not reach "test.com" because "test.com" is associated to a public address and it would result in a firewall breach because the server would make a request to the outside that will come back to him and this is not allowed by the firewall.

    Anyway I just added in the SERVER host file an entry for "test.com" which is mapped to the private address of the server and this is working.

    Im not sure why the SERVER is trying to call webservices with the "caller" host instead of using the address I entered in the website and windows service config files which are using the machinename as host.

    I dont even understand why the SERVER is initiating call to the client to start a synchronization. I thought only the website work area would have to make call to perform that operation.

    Note that this problem was breaking the content edition too. Here the kind of error I was having in the Windows event log while trying to edit content :

    Message: Exception thrown from: /
    A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond XXX.XXX.XXX.X:443 at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
    at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception) at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
    at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception)
    Last 4 events

    ReplyDelete
  13. Hi Patrick,

    It looks like a variation on the theme of what I'm suggesting. Essentially, each node must be able to see itself using the same address as the nodes...

    It's not just IP Address but Hostname that matters with Ektron.

    I'm glad you managed to resolve the issue.

    ReplyDelete
  14. In the name of completeness, I came across this problem-

    Problem- Synchronization option not appearing on settings page
    Solution - Restarting the PC solved this.
    Failed attempts - Restarting Ektron Windows Service, restarting IIS.

    ReplyDelete
  15. Login failed for user '*username*'.

    I found another eSync issue : when using SQL authentication and setting an eSync up for the first time with 8.01 and SQL server 2008, the password gets converted to lowercase.

    It is recommended to change the password to have lowercase characters or to follow this KB :
    http://dev.ektron.com/kb_article.aspx?id=32161

    ReplyDelete
  16. Awesome post. Helped me out. Thanks a lot for taking the time to put this together.

    ReplyDelete
  17. Martin,

    I've been able to restore failed various esync relationships w/o the need to do a full re-sync. With huge websites this might be a better option than completely resynchronizing (even though I've had to resort to that on occasion.)

    Great article! Keep it up.

    http://www.skonet.com/Articles_Archive/Ektron_CMS400_Net_eSync.aspx

    ReplyDelete
  18. Howdy Martin,

    We're running into an issue where archived content blocks are showing up in search results. I've check the content table and the content blocks that are archived have a content_type of 3 which I assume should make them not appear in the search results. Digging deeper I noticed that the .html.txt files that are created by SearchConfigUI are still being created for the archived content blocks.

    Is the SearchConfigUI.exe responsible for removing these .html.txt files for content blocks that are archived or is that part of the EktronWindowsService?

    Any idea how I can get the archived content blocks to not show up in my search results?

    many thanks,

    rise4peace

    ReplyDelete
  19. Hi Martin,

    We have 300 content items + 45,000 images managed by the CMS. Our eSync process never finishes the "Updating Search Catalogs" step, even after 24-36 hours. Usually, all of the other steps take 4-6 hours to complete. We've been manually stopping the eSync process by stoping and restarting the Ektron Windows Service each time, then using the SearchConfigUI to rebuild the index catalogs. Can we ever get to a point whee eSync will finish? Should it always take this long if only a few content items are being updated?

    Thanks,
    Matt

    ReplyDelete
  20. @Matt, when the sync is stopped at that status try pausing or restarting the indexing service. I think that sometimes the eSync service and index service lock/conflict (but this is not proven). Stopping indexing seems to break the dead lock.

    @Devin, I belive that archived items are still indexed but are flagged as archived in the index. There's probably an option in the api to filter out these results. I've had a few issues with the built in search API/controls as I frequently need to perform advanced filtering, so I usually make requests directly against the index.

    ReplyDelete
  21. Another tip- if after removing/recreating a damaged eSync relationship, you get the error-

    "Target database much larger than source database. Please make sure you are replacing a min cms site with your current cms site."

    Try setting eSync direction to Bidirectional. Appears that this check is only performed on one-way synchs.

    ReplyDelete
  22. Martin,

    Thanks again for this post. Serves us well each time we attempt to debug our eSync relationship.

    ReplyDelete
  23. Great post, helped me with most of my issues, but I did find one more that was difficult to find help anywhere, I managed to figure it out and thought some of you may like to know about it.

    I was getting the following error when doing a template publish, with a site that is a few gigs of files (images, video, flash, aspx, xml, etc...):

    "The maximum retry has been exceeded with no response from the remote endpoint. The reliable session was faulted. This is often an indication that the remote endpoint is no longer available."

    I ended up figuring out that the "reliable session" part of the Microsoft WCF that Ektron uses has some settings for timeouts. I searched through the program files for Ektron for all the .config files and opened them all in Notepad++. I searched and replaced all the following (shown with new settings):

    closeTimeout="23:00:00"
    openTimeout="23:00:00"
    receiveTimeout="23:00:00"
    sendTimeout="23:00:00"
    inactivityTimeout="23:00:00"

    Setting these to 23 hours seems crazy, but half of them were already set to that by Ektron. The rest were set to one minute, ten minutes and ten hours. The ones set to ten minutes seemed to be the key for me, as my publish job seemed to fail at 10 minutes and sometimes at 1 minute.

    I think the issue is not that the job is actually timing out or inactive, but that it is not getting the response it expects within that time frame because it is taking so long to transfer all the dang files. I am guessing that it queues up all the files as one step, as opposed to many single steps, so the responses come few and far between. I am not sure though.

    I do know that in 2 systems, this fixed them both. It took quite some time to figure this out and Ektron support was unable to resolve it. So I am hoping that anyone that has the same issue can find this and fix it themselves.

    I am using it on version 8.02, with a dev, stage and load balanced production environment. I updated the files on each machine separately, and restarted both Ektron services afterwards.

    Feel free to let me know if you need more details. Thanks again for such a good post!

    ReplyDelete
  24. The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state error will also occur if the destination sql server service is not started.

    ReplyDelete
    Replies
    1. Just happened to us when the transaction log disk space was full. Took a while to find that.

      Delete
  25. We recently spent a lot of time troubleshooting an HTTP 407 (proxy authentication required) error in our Windows Service Error Log.

    It turns out, the problem was that the NetBIOS name for the remote server was unresolvable on the network (we were crossing domain bounaries).

    Adding records in the local HOSTS file fixed our issue.

    ReplyDelete
  26. We're currently getting the following message at the beginning of every sync: No database found in SiteDBSettings for connection string...
    Any ideas of what file it is talking about? We recently migrated from 7.6 SP4 to v8.5 and had to move our DB's from sql2k5 to 2k8 and went through and edited all the corresponding config files (as instructed by Ektron Support). Now the sync fails to load balance on 1 of the 3 servers. It runs, but never completes.

    ReplyDelete
  27. Another point that can cause issues according to an Ektron support call is to go to Settings -> Configuration -> Synchronizations -> Settings -> There's a dropdown of profiles if you have a few.
    Sometimes the Maximum configured memory for an eSync session (kb) or Application transaction size is ZERO (0) for some reason. Setting both to a value like 16384 is good.

    ReplyDelete
  28. Hi Martyn,

    I too have problems with esync. Here is the background of what we have and what I have done to resolve it. It would be great if you can help us on this.


    Environment:

    I have 3 servers connected on this network

    One authoring server – not on the load balancer – Which has CMS access - database 1
    Two display servers – under load balancer. – NO access to the CMS - Database 2

    We used to do the content amendments on authoring server and then with eSync we will sync the content to the Display servers.

    CMS is running on 8.6 SP1
    eSync Version is running on 9.0.1 SP2


    I have started the initial sync between authoring server to the display servers. After 4 hours also it is showing 0% changes done and stopped with “Source is null “error

    I started debugging here.

    Checked the cms version and esync windows service versions on all 3 server – those are correct.

    Verified the LoadBalanced setting on display servers – all set to 1
    Verified the LoadBalanced setting on authoring servers – all set to 0
    Checked the licence keys – Correct
    Cheeked the 8732 port between three servers – Port was opened
    Cleared the sync folder
    Cleared the scheduler table
    Restarted the windows services

    Created the eSync profiles again and started the eSync

    I am running out of the options, Can you please help on figuring out the issue.


    Thanks
    RV

    ReplyDelete

Got something to say? Let it out then!
Comments are moderated, so it may take a while to for them to be displayed here!