Jin,
The bottleneck in the capture is the transfer of data between the card and the PC. If you are capturing the standard 64k samples the entire capture and transfer process could take several seconds. This would make your average over several thousand seconds. Would it be better to perhaps capture a longer buffer, transfer it once and then do the 1000 avg on your PC? This could still take several minutes depending on how much data you need. Perhaps the TSW1400 and HSDC Pro are not the correct solution.
The quicker method would be to do the capture and average in the FPGA and then only send out the result. This would require you to re-write the FPGA firmware. This is not something we can help with. If you do want to explore a custom solution you can start by looking at this example of ADC+TSW1400.
http://www.ti.com/analog/docs/litabsmultiplefilelist.tsp?literatureNumber=slaa545&docCategoryId=1&familyId=2020
Ken